Cerebras · 1 week ago
LLM Inference Performance & Evals Engineer
Cerebras Systems builds the world's largest AI chip, providing unparalleled AI compute power. The role involves working with the inference model team to validate and accelerate new model ideas on wafer-scale hardware, along with prototyping architectural tweaks and building performance-eval pipelines.
AI InfrastructureArtificial Intelligence (AI)ComputerHardwareRISCSemiconductorSoftware
Responsibilities
Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities
Qualification
Required
3 + years building high-performance ML or systems software
Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc
Strong debugging skills across performance, numerical accuracy, and runtime integration
Prior experience in modeling, compilers or crafting benchmarks or performance studies; not just black-box QA tests
Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity
Preferred
Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
Performance-tuning experience on custom silicon, GPUs, or FPGAs
Proficiency in C/C++ programming and experience with low-level optimization
Proven experience in compiler development, particularly with LLVM and/or MLIR
Publications, repos, or blog posts dissecting model speed-ups
Contributions to open-source agent frameworks
Company
Cerebras
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.
Funding
Current Stage
Late StageTotal Funding
$2.82BKey Investors
Tiger Global ManagementAtreides Management,FidelityAlpha Wave Ventures
2026-02-04Series H· $1B
2025-12-03Secondary Market
2025-09-30Series G· $1.1B
Recent News
Investing.com
2026-02-07
Company data provided by crunchbase