Apply on Employer Site

Red Hat · 23 hours ago

Senior Machine Learning Engineer - AI Eval & Safety

Boston, MA

Full-time

Hybrid

Senior Level

$171K/yr - $282K/yr

5+ years exp

Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. They are seeking a Senior Machine Learning Engineer who will be responsible for building infrastructure that ensures AI models are safe, reliable, and aligned with human values, while leading the development of evaluation platforms for large language models and agents.

Enterprise SoftwareInsurTechLinuxOpen SourceOperating SystemsSoftware

Culture & Values

H1B Sponsor Likely

Responsibilities

Architect and lead development of large-scale evaluation platforms for LLMs and agents, enabling automated, reproducible, and extensible assessment of accuracy, reliability, safety, and performance across diverse domains

Define organizational standards and metrics for LLM/agent evaluation, covering hallucination detection, factuality, bias, robustness, interpretability, and alignment drift

Build platform components and APIs that allow product teams to integrate evaluation seamlessly into training, fine-tuning, deployment, and continuous monitoring workflows

Design automated pipelines and benchmarks for adversarial testing, red-teaming, and stress testing of LLMs and retrieval-augmented generation (RAG) systems

Lead initiatives in multi-dimensional evaluation, including safety (toxicity, bias, harmful outputs), grounding (retrieval correctness, source attribution), and agent behaviors (tool use, planning, trustworthiness)

Collaborate with cross-functional stakeholders (safety, product, research, infrastructure) to translate abstract evaluation goals into measurable, system-level frameworks

Advance interpretability and observability, developing tools that allow teams to understand, debug, and explain LLM behaviors in production

Mentor engineers and establish best practices, driving adoption of evaluation-driven development across the organization

Influence technical roadmaps and industry direction, representing the team’s evaluation-first approach in external forums and publications

Qualification

Large-scale evaluationLLM evaluation metricsPlatform engineeringPythonPyTorchHugging FaceMLOpsTechnical leadershipMentoringCollaborationCommunication

Required

5+ years of ML engineering experience, with 3+ years focused on large-scale evaluation of transformer-based LLMs and/or agentic systems

Proven experience building evaluation platforms or frameworks that operate across training, deployment, and post-deployment contexts

Deep expertise in designing and implementing LLM evaluation metrics (factuality, hallucination detection, grounding, toxicity, robustness)

Strong background in scalable platform engineering, including APIs, pipelines, and integrations used by multiple product teams

Demonstrated ability to bridge research and engineering, operationalizing safety and alignment techniques into production evaluation systems

Proficiency in Python, PyTorch, Hugging Face, and modern ML ops/deployment environments

Track record of technical leadership, including mentoring, architecture design, and defining org-wide practices

Preferred

Experience with multi-agent evaluation frameworks and graph-based metrics for agent interactions

Background in retrieval-augmented generation (RAG) evaluation (retrieval precision/recall, grounding, attribution)

Contributions to AI safety or evaluation research in industry or academia

Familiarity with adversarial testing methodologies and automated red-teaming

Knowledge of interpretability and transparency methods for LLMs

Advanced degree in ML/CS or related field with focus on evaluation, safety, or interpretability

Benefits

Comprehensive medical, dental, and vision coverage

Flexible Spending Account - healthcare and dependent care

Health Savings Account - high deductible medical plan

Retirement 401(k) with employer match

Paid time off and holidays

Paid parental leave plans for all new parents

Leave benefits including disability, paid family medical leave, and paid military leave

Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!

Company

Red Hat

Glassdoor4.1

Red Hat is a software company that offers enterprise open-source software solutions. It is a sub-organization of IBM.

Founded in 1993

Raleigh, North Carolina, USA

10001+ employees

http://www.redhat.com

H1B Sponsorship

Red Hat has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (159)

2024 (148)

2023 (156)

2022 (181)

2021 (154)

2020 (106)

Funding

Current Stage

Public Company

Total Funding

unknown

2018-10-28Acquired

1999-08-20IPO

1999-03-09Corporate Round

Leadership Team

Chris Wright

Chief Technology Officer and Senior Vice President Global Engineering

Mark Little

CTO JBoss

Recent News

SD Times

The top software development news of the year

2025-12-25

The New Stack

Kubernetes: Get the Most from Dynamic Resource Allocation

2025-12-24

CRN

5 Companies That Came To Win This Week

2025-12-19

Company data provided by crunchbase