SIGN IN
Senior Staff Machine Learning Engineer -Frameworks jobs in United States
cer-icon
Apply on Employer Site
company-logo

d-Matrix · 9 hours ago

Senior Staff Machine Learning Engineer -Frameworks

d-Matrix is a pioneering company specializing in data center AI inferencing solutions, focused on unleashing the potential of generative AI. They are seeking a Senior Staff Machine Learning Engineer - Frameworks to design, build, and optimize machine learning deployment pipelines for large-scale models, enhancing the efficiency and scalability of generative AI applications.
Artificial Intelligence (AI)SemiconductorCloud ComputingAI InfrastructureCloud InfrastructureData Center
check
H1B Sponsor Likelynote

Responsibilities

Design, build, and optimize machine learning deployment pipelines for large-scale models
Implement and enhance model inference frameworks
Develop automated workflows for model development, experimentation, and deployment
Collaborate with research, architecture, and engineering teams to improve model performance and efficiency
Work with distributed computing frameworks (e.g., PyTorch/XLA, JAX, TensorFlow, Ray) to optimize model parallelism and deployment
Implement scalable KV caching and memory-efficient inference techniques for transformer-based models
Monitor and optimize infrastructure performance across different levels of custom hardware hierarchy—cards, servers, and racks which are powered by the d-Matrix custom AI chips
Ensure best practices in ML model versioning, evaluation, and monitoring

Qualification

PythonMachine Learning frameworksModel optimizationDistributed computing frameworksQuantization techniquesSoftware engineering best practicesProblem-solving skillsCollaborationFast-paced environment

Required

BS in Computer Science with 7+ years of strong programming skills in Python and experience with ML frameworks like PyTorch, TensorFlow, or JAX
Hands-on experience with model optimization, quantization, and inference acceleration
Deep understanding of transformer architectures, attention mechanisms, and distributed inference (tensor parallel, pipeline parallel, sequence parallel)
Knowledge of quantization (INT8, BF16, FP16) and memory-efficient inference techniques
Solid grasp of software engineering best practices, including CI/CD, containerization (Docker, Kubernetes), and MLOps
Strong problem-solving skills and ability to work in a fast-paced, iterative development environment

Preferred

Experience working with cloud-based ML pipelines (AWS, GCP, or Azure)
Experience with LLM fine-tuning, LoRA, PEFT, and KV cache optimizations
Contributions to open-source ML projects or research publications
Experience with low-level optimizations using CUDA, Triton, or XLA

Benefits

Competitive compensation, benefits, and opportunities for career growth

Company

d-Matrix

twittertwittertwitter
company-logo
D-Matrix is a platform that enables data centers to handle large-scale generative AI inference with high throughput and low latency.

H1B Sponsorship

d-Matrix has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (20)
2024 (15)
2023 (8)
2022 (7)

Funding

Current Stage
Growth Stage
Total Funding
$429M
Key Investors
Bullhound Capital,Temasek Holdings,Triatomic CapitalTemasek HoldingsM12 - Microsoft's Venture Fund,Playground Global,SK Hynix
2025-11-12Series C· $275M
2023-09-06Series B· $110M
2022-04-20Series A· $44M

Leadership Team

leader-logo
Peter Buckingham
Senior Vice President, Software Engineering
linkedin
Company data provided by crunchbase