Cerebras · 3 weeks ago
Senior Research Engineer - Inference ML
Cerebras Systems builds the world's largest AI chip, transforming AI applications across various fields. As a Senior Research Engineer on the Inference ML team, you will adapt advanced language and vision models to run efficiently on Cerebras architecture and work on cutting-edge inference research.
Artificial Intelligence (AI)ComputerHardwareSemiconductorSoftware
Responsibilities
Design, implement, and optimize state-of-the-art transformer architectures for NLP and computer vision on Cerebras hardware
Research and prototype novel inference algorithms and model architectures that exploit the unique capabilities of Cerebras hardware, with emphasis on speculative decoding, pruning/compression, sparse attention, and sparsity
Train models to convergence, perform hyperparameter sweeps, and analyze results to inform next steps
Bring up new models on the Cerebras system, validate functional correctness, and troubleshoot any integration issues
Profile and optimize model code using Cerebras tools to maximize throughput and minimize latency
Develop diagnostic tooling or scripts to surface performance bottlenecks and guide optimization strategies for inference workloads
Collaborate across teams, including software, hardware, and product, to drive projects from inception through delivery
Qualification
Required
Bachelor's degree in Computer Science, Software Engineering, Computer Engineering, Electrical Engineering, or a related technical field AND 7+ years of ML software development experience, OR
Master's degree in Computer Science or related technical field AND 4+ years of software development experience, OR
PhD in Computer Science or related technical field with 2+ years of relevant research or industry experience, OR
Equivalent practical experience
4+ years of experience testing, maintaining, or launching software products, including 2+ years of experience with software design and architecture
3+ years of experience in software development focused on machine learning (e.g., deep learning, large language models, or computer vision)
Strong programming skills in Python and/or C++
Experience with Generative AI and Machine Learning systems
Proficiency with at least one major ML framework (PyTorch, Transformers, vLLM, or SGLang)
Deep understanding of transformer-based models in language and/or vision domains, with demonstrated experience implementing and optimizing them
Proven ability to implement custom layers, operators, and backpropagation logic
Strong foundation in performance optimization on specialized hardware (e.g., GPUs, TPUs, or HPC interconnects)
Deep understanding of modern ML architectures and strong intuition for optimizing their performance, particularly for inference workloads using sparse attention, pruning/compression, and speculative decoding
Track record of owning problems end-to-end and autonomously acquiring whatever knowledge is needed to deliver results
Self-directed mindset with a demonstrated ability to identify and tackle the most impactful problems
Collaborative approach with humility, eagerness to help colleagues, and commitment to team success
Genuine passion for AI and a drive to push the limits of inference performance
Preferred
Master's degree or PhD in Computer Science, Computer Engineering, or a related technical field
Experience independently driving complex ML or inference projects from prototype to production-quality implementations
Hands-on experience with relevant ML frameworks such as PyTorch, Transformers, vLLM, or SGLang
Experience with large language models, mixture-of-experts models, multimodal learning, or AI agents
Experience with speculative decoding, neural network pruning and compression, sparse attention, quantization, sparsity, post-training techniques, and inference-focused evaluations
Familiarity with large-scale model training and deployment, including performance and cost trade-offs in production systems
Triton/CUDA experience is a big plus
Benefits
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.
Company
Cerebras
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.
Funding
Current Stage
Late StageTotal Funding
$1.82BKey Investors
Alpha Wave VenturesVy CapitalCoatue
2025-12-03Secondary Market
2025-09-30Series G· $1.1B
2024-09-27Series Unknown
Recent News
globalventuring.com
2025-12-27
Crunchbase News
2025-12-26
2025-12-26
Company data provided by crunchbase