SIGN IN
LLM Inference Performance & Evals Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cerebras · 1 week ago

LLM Inference Performance & Evals Engineer

Cerebras Systems builds the world's largest AI chip, providing unparalleled AI compute power. The role involves working with the inference model team to validate and accelerate new model ideas on wafer-scale hardware, along with prototyping architectural tweaks and building performance-eval pipelines.
AI InfrastructureArtificial Intelligence (AI)ComputerHardwareRISCSemiconductorSoftware
check
Growth Opportunities

Responsibilities

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities

Qualification

High-performance ML softwareTransformer mathAI toolchain navigationDebugging skillsModeling experienceCompiler developmentC/C++ programmingPassion for AI agentsPerformance tuningContributions to open-source

Required

3 + years building high-performance ML or systems software
Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc
Strong debugging skills across performance, numerical accuracy, and runtime integration
Prior experience in modeling, compilers or crafting benchmarks or performance studies; not just black-box QA tests
Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Preferred

Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
Performance-tuning experience on custom silicon, GPUs, or FPGAs
Proficiency in C/C++ programming and experience with low-level optimization
Proven experience in compiler development, particularly with LLVM and/or MLIR
Publications, repos, or blog posts dissecting model speed-ups
Contributions to open-source agent frameworks

Company

Cerebras

twittertwittertwitter
company-logo
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Funding

Current Stage
Late Stage
Total Funding
$2.82B
Key Investors
Tiger Global ManagementAtreides Management,FidelityAlpha Wave Ventures
2026-02-04Series H· $1B
2025-12-03Secondary Market
2025-09-30Series G· $1.1B

Leadership Team

leader-logo
Andrew Feldman
Founder and CEO
linkedin
leader-logo
Bob Komin
Chief Financial Officer
linkedin
Company data provided by crunchbase