Apply on Employer Site

Meta · 1 week ago

Research Engineer, Evaluations - Meta Superintelligence Labs (Technical Leadership)

Menlo Park, CA

Full-time

Onsite

Senior Level

$219/hr - $301/hr

5+ years exp

Meta is seeking Research Engineers to join the Evaluations team within Meta Superintelligence Labs. This role involves curating and building benchmarks for advanced AI models, working alongside world-class researchers and engineers to develop and deploy novel evaluation environments.

Computer Software

Comp. & Benefits

Responsibilities

Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development

Develop and implement evaluation environments, including environments for novel model capabilities and modalities

Collaborate with external data vendors to source and prepare high-quality evaluation datasets

Execute on the technical vision of research scientists designing new benchmarks and evaluations

Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas

Contribute to evaluation tooling that measures the quality and reliability of evaluation suites

Qualification

Machine Learning EngineeringPythonPyTorchEvaluation BenchmarksReinforcement LearningDistributed SystemsSoftware Engineering PracticesAdaptabilityIndependent Work

Required

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

5+ years of experience in machine learning engineering, machine learning research, or a related technical role

Proficiency in Python and experience with ML frameworks such as PyTorch

Experience identifying, designing and completing medium to large technical features independently, without guidance

Software engineering practices including version control, testing, and code review practices

Demonstrated experience of working independently and adapting to rapidly changing priorities

Preferred

Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to language model evaluation, benchmarking, or deep learning

Hands-on experience with language model post-training and deep learning systems, or building reinforcement learning environments

Experience implementing or developing evaluation benchmarks for large language models and multimodal models (e.g., vision-language, audio, video)

Experience working with large-scale distributed systems and data pipelines

Familiarity with language model evaluation frameworks and metrics

Track record of open-source contributions to ML evaluation tools or benchmarks

Company

Funding

Current Stage

Late Stage

Leadership Team

Kathryn Glickman

Director, CEO Communications

Christine Lu

CTO Business Engineering NA

Recent News

Crunchbase News

Beyond The Pitch: How Emerging VCs Can Still Raise

2025-11-17

torrentfreak.com

Tit-For-Tat: Porn Producers Counter Meta’s “Personal Use” Piracy Defense

2025-11-16

Livemint.com

As AI borrowing surges, lenders and investors rush to guard against growing default risks

2025-11-16

Company data provided by crunchbase