Senior Research Engineer - Inference ML jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cerebras · 3 weeks ago

Senior Research Engineer - Inference ML

Cerebras Systems builds the world's largest AI chip, transforming AI applications across various fields. As a Senior Research Engineer on the Inference ML team, you will adapt advanced language and vision models to run efficiently on Cerebras architecture and work on cutting-edge inference research.

Artificial Intelligence (AI)ComputerHardwareSemiconductorSoftware
check
Growth Opportunities

Responsibilities

Design, implement, and optimize state-of-the-art transformer architectures for NLP and computer vision on Cerebras hardware
Research and prototype novel inference algorithms and model architectures that exploit the unique capabilities of Cerebras hardware, with emphasis on speculative decoding, pruning/compression, sparse attention, and sparsity
Train models to convergence, perform hyperparameter sweeps, and analyze results to inform next steps
Bring up new models on the Cerebras system, validate functional correctness, and troubleshoot any integration issues
Profile and optimize model code using Cerebras tools to maximize throughput and minimize latency
Develop diagnostic tooling or scripts to surface performance bottlenecks and guide optimization strategies for inference workloads
Collaborate across teams, including software, hardware, and product, to drive projects from inception through delivery

Qualification

Machine LearningPythonC++Deep LearningPyTorchTransformersGenerative AIPerformance OptimizationPassion for AICollaborative ApproachSelf-directed Mindset

Required

Bachelor's degree in Computer Science, Software Engineering, Computer Engineering, Electrical Engineering, or a related technical field AND 7+ years of ML software development experience, OR
Master's degree in Computer Science or related technical field AND 4+ years of software development experience, OR
PhD in Computer Science or related technical field with 2+ years of relevant research or industry experience, OR
Equivalent practical experience
4+ years of experience testing, maintaining, or launching software products, including 2+ years of experience with software design and architecture
3+ years of experience in software development focused on machine learning (e.g., deep learning, large language models, or computer vision)
Strong programming skills in Python and/or C++
Experience with Generative AI and Machine Learning systems
Proficiency with at least one major ML framework (PyTorch, Transformers, vLLM, or SGLang)
Deep understanding of transformer-based models in language and/or vision domains, with demonstrated experience implementing and optimizing them
Proven ability to implement custom layers, operators, and backpropagation logic
Strong foundation in performance optimization on specialized hardware (e.g., GPUs, TPUs, or HPC interconnects)
Deep understanding of modern ML architectures and strong intuition for optimizing their performance, particularly for inference workloads using sparse attention, pruning/compression, and speculative decoding
Track record of owning problems end-to-end and autonomously acquiring whatever knowledge is needed to deliver results
Self-directed mindset with a demonstrated ability to identify and tackle the most impactful problems
Collaborative approach with humility, eagerness to help colleagues, and commitment to team success
Genuine passion for AI and a drive to push the limits of inference performance

Preferred

Master's degree or PhD in Computer Science, Computer Engineering, or a related technical field
Experience independently driving complex ML or inference projects from prototype to production-quality implementations
Hands-on experience with relevant ML frameworks such as PyTorch, Transformers, vLLM, or SGLang
Experience with large language models, mixture-of-experts models, multimodal learning, or AI agents
Experience with speculative decoding, neural network pruning and compression, sparse attention, quantization, sparsity, post-training techniques, and inference-focused evaluations
Familiarity with large-scale model training and deployment, including performance and cost trade-offs in production systems
Triton/CUDA experience is a big plus

Benefits

Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Company

Cerebras

twittertwittertwitter
company-logo
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Funding

Current Stage
Late Stage
Total Funding
$1.82B
Key Investors
Alpha Wave VenturesVy CapitalCoatue
2025-12-03Secondary Market
2025-09-30Series G· $1.1B
2024-09-27Series Unknown

Leadership Team

leader-logo
Andrew Feldman
Founder and CEO
linkedin
leader-logo
Bob Komin
Chief Financial Officer
linkedin
Company data provided by crunchbase