LLM Inference Performance & Evals Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cerebras · 2 days ago

LLM Inference Performance & Evals Engineer

Cerebras Systems builds the world's largest AI chip, providing industry-leading training and inference speeds. The role involves working on state-of-the-art models, validating and accelerating new ideas on wafer-scale hardware.

Artificial Intelligence (AI)ComputerHardwareSemiconductorSoftware
check
Growth Opportunities

Responsibilities

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities

Qualification

High-performance ML softwareTransformer mathAI toolchain navigationDebugging skillsModeling experienceAI agents automationC/C++ programmingCompiler developmentPerformance tuningOpen-source contributions

Required

3 + years building high-performance ML or systems software
Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc
Strong debugging skills across performance, numerical accuracy, and runtime integration
Prior experience in modeling, compilers or crafting benchmarks or performance studies; not just black-box QA tests
Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Preferred

Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
Performance-tuning experience on custom silicon, GPUs, or FPGAs
Proficiency in C/C++ programming and experience with low-level optimization
Proven experience in compiler development, particularly with LLVM and/or MLIR
Publications, repos, or blog posts dissecting model speed-ups
Contributions to open-source agent frameworks

Company

Cerebras

twittertwittertwitter
company-logo
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Funding

Current Stage
Late Stage
Total Funding
$1.82B
Key Investors
Alpha Wave VenturesVy CapitalCoatue
2025-12-03Secondary Market
2025-09-30Series G· $1.1B
2024-09-27Series Unknown

Leadership Team

leader-logo
Andrew Feldman
Founder and CEO
linkedin
leader-logo
Bob Komin
Chief Financial Officer
linkedin
Company data provided by crunchbase