AI/LLM Evaluation & Alignment Software Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

LeoTech · 1 hour ago

AI/LLM Evaluation & Alignment Software Engineer

LeoTech is passionate about building software that solves real-world problems in the Public Safety sector. The AI/LLM Evaluation & Alignment Software Engineer will ensure that Large Language Model (LLM) and Agentic AI solutions are accurate and aligned with public safety workflows by designing evaluation frameworks and implementing bias-mitigation strategies.

HardwareInformation TechnologyIT ManagementSoftware

Responsibilities

Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases
Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows
Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability)
Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems
Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios
Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment
Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs
Provide technical leadership in responsible AI practices, influencing standards across the organization
Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus)
Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation

Qualification

LLM evaluationBias detectionPythonDevOps/MLOpsEvaluation techniquesCloud AI platformsKubernetesProblem-solvingCommunication skills

Required

Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field
3–5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety
Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming
Experience with bias detection, fairness approaches, and responsible AI design
Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith
Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex)
Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions
Understanding of cloud AI platforms (AWS, Azure) and deployment best practices
Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios
Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders

Benefits

3 weeks of paid vacation – out the gate!!
Generous medical, dental, and vision plans.
Sick, and paid holidays are offered.

Company

LeoTech

twittertwittertwitter
company-logo
LeoTech is leading the effort to assist public safety efforts around the nation.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Steven Harpe
Chief Product Officer
linkedin
Company data provided by crunchbase