Apply on Employer Site

NVIDIA · 1 day ago

Senior Prompt and Benchmark Engineer, Evaluation of World Models

Santa Clara, CA

Full-time

Onsite

Senior Level, Lead/Staff

$184K/yr - $288K/yr

10+ years exp

NVIDIA is a leading company in generative AI engineering, pushing the boundaries of multimodal learning and intelligent simulation. They are seeking a Senior Prompt and Benchmark Engineer to develop benchmarks for evaluating world foundation models and to utilize prompt engineering techniques for structured responses from foundation models.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Develop detailed, domain-specific benchmarks for evaluating world foundation models, especially generation and understanding world models that reason about video, simulation, and physical environments

Use sophisticated prompt engineering techniques to elicit structured, interpretable responses from a variety of foundation models

Build, refine, and maintain question banks, multiple-choice formats, and test suites to support both automated and human evaluation workflows

Employ multiple VLMs in parallel to explore ensemble evaluation methods such as majority voting, ranking agreement, and answer consensus

Make evaluation as automated and scalable as possible by encoding prompts and expected outputs into structured formats for downstream consumption

Interface directly with Cosmos researchers to translate their evaluation needs into scalable test cases

Collaborate with human annotators, providing clearly structured tasks, feedback loops, and quality control mechanisms to ensure dataset reliability

Meet regularly with domain experts in robotics, autonomous vehicles, and simulation to understand their internal benchmarks, derive transferable metrics, and co-develop standardized evaluation formats

Qualification

Prompt engineeringBenchmark designMachine LearningNatural Language ProcessingHuman-Computer InteractionCuriosity about modelsAttention to detailCommunication skillsCollaboration skills

Required

Demonstrated experience with prompt engineering, including crafting, refining, and optimizing prompts

Strong attention to detail in designing natural language questions and formatting structured evaluations

Proven ability to reason about model capabilities, failure modes, and blind spots in real-world generative model deployments

Experience crafting or contributing to benchmarks or evaluation datasets, especially for multimodal or agentic systems

Familiarity with evaluating models via prompting, capturing structured outputs, and comparing across model families

Excellent communication and collaboration skills—you will regularly meet with researchers, annotators, and downstream users to iterate on benchmark design

A working understanding of how VLMs and foundation models function at inference time, including token-level outputs, autoregressive decoding, and model context windows

10+ years of experience in Machine Learning, NLP, Human-Computer Interaction, or related fields

BS, MS, or equivalent background. Prior experience in AI evaluation, annotation workflows, or research is highly valued

Preferred

Hands-on experience with multiple LLMs or VLMs (e.g., GPT, Claude, Gemini, Flamingo, Kosmos, IDEFICS, etc.) to compare outputs and engineer task-specific prompts

Prior work designing benchmarks for robotics, simulation, AV, or agentic tasks, especially in multimodal or video-based settings

Experience working with human annotation teams, building clear instructions and QA processes for large-scale labeling campaigns

Familiarity with using VLMs as evaluators, leveraging models for response scoring, ranking, or consensus aggregation

Deep curiosity about model behavior and a drive to test, interrogate, and stretch the limits of generative systems

Benefits

Equity

Benefits

Company

NVIDIA

Glassdoor4.6

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

Founded in 1993

Santa Clara, California, USA

10001+ employees

https://www.nvidia.com

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1877)

2024 (1355)

2023 (976)

2022 (835)

2021 (601)

2020 (529)

Funding

Current Stage

Public Company

Total Funding

$4.09B

Key Investors

ARPA-EARK Investment ManagementSoftBank Vision Fund

2023-05-09Grant· $5M

2022-08-09Post Ipo Equity· $65M

2021-02-18Post Ipo Equity

Leadership Team

Jensen Huang

Founder and CEO

Michael Kagan

Chief Technology Officer

Recent News

The Motley Fool

I Nailed My Nvidia Market Cap Prediction in 2025. Here's Where I Predict It's Going in 2026 (Hint: You're Going to Want to Buy Now)

2026-01-23

Digital Journal

Intel shares plunge on earnings expectations

2026-01-23

Sifted

Nvidia in talks to back Yann LeCun’s new AI startup in bumper funding round

2026-01-23

Company data provided by crunchbase