Apply on Employer Site

Cerebras · 3 months ago

Engineering Manager, Inference Platform

Sunnyvale, CA

Full-time

Onsite

Senior Level

6+ years exp

Cerebras Systems builds the world's largest AI chip and is seeking an Engineering Manager for their Inference Service Platform. The role involves leading a high-performing team to scale LLM inference on advanced compute clusters and delivering an enterprise-ready solution for customers.

AI InfrastructureArtificial Intelligence (AI)ComputerHardwareSemiconductorSoftware

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Provide hands-on technical leadership, owning the technical vision and roadmap for the Cerebras Inference Platform, from internal scaling to on-prem customer solutions

Lead the end-to-end development of distributed inference systems, including request routing, autoscaling, and resource orchestration on Cerebras' unique hardware

Drive a culture of operational excellence, guaranteeing platform reliability (>99.9% uptime), performance, and efficiency

Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution

Productize the platform into an enterprise-ready, on-prem solution, collaborating closely with product, ops, and customer teams to ensure successful deployments

Qualification

Inference ExpertiseML Systems KnowledgeFrameworks & ToolsTechnical LeadershipInfrastructureOperations & MonitoringLeadership & Collaboration

Required

6+ years in high-scale software engineering

3+ years leading distributed systems or ML infra teams

Strong coding and review skills

Proven track record scaling LLM inference

Optimizing latency (<100ms P99)

Optimizing throughput, batching, memory/IO efficiency and resources utilization

Expertise in distributed inference/training for modern LLMs

Understanding of AI/ML ecosystems, including public clouds (AWS/GCP/Azure)

Hands-on with model-serving frameworks (e.g. vLLM, TensorRT-LLM, Triton or similar)

Hands-on with ML stacks (PyTorch, Hugging Face, SageMaker)

Deep experience with orchestration (Kubernetes/EKS, Slurm)

Experience with large clusters and low-latency networking

Strong background in monitoring and reliability engineering (Prometheus/Grafana, incident response, post-mortems)

Demonstrated ability to recruit and retain high-performing teams

Ability to mentor engineers

Ability to partner cross-functionally to deliver customer-facing products

Preferred

Experience with on-prem/private cloud deployments

Background in edge or streaming inference

Experience with multi-region systems

Experience with security/privacy in AI

Customer-facing experience with enterprise deployments

Company

Cerebras

Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Founded in 2016

Sunnyvale, California, USA

501-1000 employees

https://cerebras.ai

H1B Sponsorship

Cerebras has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (31)

2024 (16)

2023 (18)

2022 (17)

2021 (34)

2020 (23)

Funding

Current Stage

Late Stage

Total Funding

$1.82B

Key Investors

Alpha Wave VenturesVy CapitalCoatue

2025-12-03Secondary Market

2025-09-30Series G· $1.1B

2024-09-27Series Unknown

Leadership Team

Andrew Feldman

CEO & Founder

Bob Komin

Chief Financial Officer

Recent News

Crunchbase News

Sector Snapshot: US Semiconductor Startup Funding Hits Record High

2026-01-06

Benzinga.com

Who's Going Public Next? Kalshi Bets Drop US IPO Clues Before 2027— And It's Not Just SpaceX Or OpenAI

2026-01-03

Foundation Capital

Foundation Capital Portfolio

2026-01-02

Company data provided by crunchbase