Cerebras · 3 weeks ago
Engineering Manager, Inference Platform
Cerebras Systems builds the world's largest AI chip, providing unprecedented AI compute power. They are seeking a deeply technical, hands-on Engineering Manager for their Inference Service Platform to lead a team in scaling LLM inference and delivering enterprise solutions.
Quantum ComputingArtificial Intelligence (AI)SemiconductorElectronicsHardwareSoftwareAI InfrastructureComputerRISC
Responsibilities
Provide hands-on technical leadership, owning the technical vision and roadmap for the Cerebras Inference Platform, from internal scaling to on-prem customer solutions
Lead the end-to-end development of distributed inference systems, including request routing, autoscaling, and resource orchestration on Cerebras' unique hardware
Drive a culture of operational excellence, guaranteeing platform reliability (>99.9% uptime), performance, and efficiency
Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution
Productize the platform into an enterprise-ready, on-prem solution, collaborating closely with product, ops, and customer teams to ensure successful deployments
Qualification
Required
6+ years in high-scale software engineering, with 3+ years leading distributed systems or ML infra teams; strong coding and review skills
Proven track record scaling LLM inference: optimizing latency (<100ms P99), throughput, batching, memory/IO efficiency and resources utilization
Expertise in distributed inference/training for modern LLMs; understanding of AI/ML ecosystems, including public clouds (AWS/GCP/Azure)
Hands-on with model-serving frameworks (e.g. vLLM, TensorRT-LLM, Triton or similar) and ML stacks (PyTorch, Hugging Face, SageMaker)
Deep experience with orchestration (Kubernetes/EKS, Slurm), large clusters, and low-latency networking
Strong background in monitoring and reliability engineering (Prometheus/Grafana, incident response, post-mortems)
Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products
Preferred
Experience with on-prem/private cloud deployments
Background in edge or streaming inference, multi-region systems, or security/privacy in AI
Customer-facing experience with enterprise deployments
Company
Cerebras
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.
H1B Sponsorship
Cerebras has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (31)
2024 (16)
2023 (18)
2022 (17)
2021 (34)
2020 (23)
Funding
Current Stage
Late StageTotal Funding
$2.82BKey Investors
Atreides Management,Fidelity,Tiger Global ManagementAtreides Management,FidelityAlpha Wave Ventures
2026-02-04Series H· $1B
2025-12-03Secondary Market
2025-09-30Series G· $1.1B
Recent News
Tech Startups - Tech News, Tech Trends & Startup Funding
2026-02-12
Crunchbase News
2026-02-12
Crunchbase News
2026-02-11
Company data provided by crunchbase