Apply on Employer Site

Cerebras · 1 week ago

LLM Inference Performance & Evals Engineer

Toronto, Ontario, Canada

Full-time

Onsite

Mid Level

3+ years exp

Cerebras Systems builds the world's largest AI chip, providing unparalleled AI compute power. The role involves working with the inference model team to validate and accelerate new model ideas on wafer-scale hardware, along with prototyping architectural tweaks and building performance-eval pipelines.

AI InfrastructureArtificial Intelligence (AI)ComputerHardwareRISCSemiconductorSoftware

Growth Opportunities

Responsibilities

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge

Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests

Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation

Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities

Qualification

High-performance ML softwareTransformer mathAI toolchain navigationDebugging skillsModeling experienceCompiler developmentC/C++ programmingPassion for AI agentsPerformance tuningContributions to open-source

Required

3 + years building high-performance ML or systems software

Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly

Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc

Strong debugging skills across performance, numerical accuracy, and runtime integration

Prior experience in modeling, compilers or crafting benchmarks or performance studies; not just black-box QA tests

Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Preferred

Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research

Performance-tuning experience on custom silicon, GPUs, or FPGAs

Proficiency in C/C++ programming and experience with low-level optimization

Proven experience in compiler development, particularly with LLVM and/or MLIR

Publications, repos, or blog posts dissecting model speed-ups

Contributions to open-source agent frameworks

Company

Cerebras

Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Founded in 2015

Sunnyvale, California, USA

501-1000 employees

https://cerebras.ai

Funding

Current Stage

Late Stage

Total Funding

$2.82B

Key Investors

Tiger Global ManagementAtreides Management,FidelityAlpha Wave Ventures

2026-02-04Series H· $1B

2025-12-03Secondary Market

2025-09-30Series G· $1.1B

Leadership Team

Andrew Feldman

Founder and CEO

Bob Komin

Chief Financial Officer

Recent News

Investing.com

Vista Equity Partners and Intel to lead investment in AI chip startup SambaNova, sources say

2026-02-07

Techmeme

Source: Benchmark raised $225M in special funds to invest in Cerebras' ~$1B Series H, led by Tiger Global; Benchmark led Cerebras' $27M Series A in 2016 (Marina Temkin/TechCrunch)

2026-02-07

Pulse 2.0

Cerebras: AI Chipmaker Raises $1 Billion Series H At About $23 Billion Valuation

2026-02-06

Company data provided by crunchbase