Engineering Manager, Inference Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cerebras · 3 months ago

Engineering Manager, Inference Platform

Cerebras Systems builds the world's largest AI chip and is seeking an Engineering Manager for their Inference Service Platform. The role involves leading a high-performing team to scale LLM inference on advanced compute clusters and delivering an enterprise-ready solution for customers.

AI InfrastructureArtificial Intelligence (AI)ComputerHardwareSemiconductorSoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Provide hands-on technical leadership, owning the technical vision and roadmap for the Cerebras Inference Platform, from internal scaling to on-prem customer solutions
Lead the end-to-end development of distributed inference systems, including request routing, autoscaling, and resource orchestration on Cerebras' unique hardware
Drive a culture of operational excellence, guaranteeing platform reliability (>99.9% uptime), performance, and efficiency
Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution
Productize the platform into an enterprise-ready, on-prem solution, collaborating closely with product, ops, and customer teams to ensure successful deployments

Qualification

Inference ExpertiseML Systems KnowledgeFrameworks & ToolsTechnical LeadershipInfrastructureOperations & MonitoringLeadership & Collaboration

Required

6+ years in high-scale software engineering
3+ years leading distributed systems or ML infra teams
Strong coding and review skills
Proven track record scaling LLM inference
Optimizing latency (<100ms P99)
Optimizing throughput, batching, memory/IO efficiency and resources utilization
Expertise in distributed inference/training for modern LLMs
Understanding of AI/ML ecosystems, including public clouds (AWS/GCP/Azure)
Hands-on with model-serving frameworks (e.g. vLLM, TensorRT-LLM, Triton or similar)
Hands-on with ML stacks (PyTorch, Hugging Face, SageMaker)
Deep experience with orchestration (Kubernetes/EKS, Slurm)
Experience with large clusters and low-latency networking
Strong background in monitoring and reliability engineering (Prometheus/Grafana, incident response, post-mortems)
Demonstrated ability to recruit and retain high-performing teams
Ability to mentor engineers
Ability to partner cross-functionally to deliver customer-facing products

Preferred

Experience with on-prem/private cloud deployments
Background in edge or streaming inference
Experience with multi-region systems
Experience with security/privacy in AI
Customer-facing experience with enterprise deployments

Company

Cerebras

twittertwittertwitter
company-logo
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

H1B Sponsorship

Cerebras has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (31)
2024 (16)
2023 (18)
2022 (17)
2021 (34)
2020 (23)

Funding

Current Stage
Late Stage
Total Funding
$1.82B
Key Investors
Alpha Wave VenturesVy CapitalCoatue
2025-12-03Secondary Market
2025-09-30Series G· $1.1B
2024-09-27Series Unknown

Leadership Team

leader-logo
Andrew Feldman
CEO & Founder
linkedin
leader-logo
Bob Komin
Chief Financial Officer
linkedin
Company data provided by crunchbase