d-Matrix · 9 hours ago
Senior Staff Machine Learning Engineer -Frameworks
d-Matrix is a pioneering company specializing in data center AI inferencing solutions, focused on unleashing the potential of generative AI. They are seeking a Senior Staff Machine Learning Engineer - Frameworks to design, build, and optimize machine learning deployment pipelines for large-scale models, enhancing the efficiency and scalability of generative AI applications.
Artificial Intelligence (AI)SemiconductorCloud ComputingAI InfrastructureCloud InfrastructureData Center
Responsibilities
Design, build, and optimize machine learning deployment pipelines for large-scale models
Implement and enhance model inference frameworks
Develop automated workflows for model development, experimentation, and deployment
Collaborate with research, architecture, and engineering teams to improve model performance and efficiency
Work with distributed computing frameworks (e.g., PyTorch/XLA, JAX, TensorFlow, Ray) to optimize model parallelism and deployment
Implement scalable KV caching and memory-efficient inference techniques for transformer-based models
Monitor and optimize infrastructure performance across different levels of custom hardware hierarchy—cards, servers, and racks which are powered by the d-Matrix custom AI chips
Ensure best practices in ML model versioning, evaluation, and monitoring
Qualification
Required
BS in Computer Science with 7+ years of strong programming skills in Python and experience with ML frameworks like PyTorch, TensorFlow, or JAX
Hands-on experience with model optimization, quantization, and inference acceleration
Deep understanding of transformer architectures, attention mechanisms, and distributed inference (tensor parallel, pipeline parallel, sequence parallel)
Knowledge of quantization (INT8, BF16, FP16) and memory-efficient inference techniques
Solid grasp of software engineering best practices, including CI/CD, containerization (Docker, Kubernetes), and MLOps
Strong problem-solving skills and ability to work in a fast-paced, iterative development environment
Preferred
Experience working with cloud-based ML pipelines (AWS, GCP, or Azure)
Experience with LLM fine-tuning, LoRA, PEFT, and KV cache optimizations
Contributions to open-source ML projects or research publications
Experience with low-level optimizations using CUDA, Triton, or XLA
Benefits
Competitive compensation, benefits, and opportunities for career growth
Company
d-Matrix
D-Matrix is a platform that enables data centers to handle large-scale generative AI inference with high throughput and low latency.
H1B Sponsorship
d-Matrix has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (20)
2024 (15)
2023 (8)
2022 (7)
Funding
Current Stage
Growth StageTotal Funding
$429MKey Investors
Bullhound Capital,Temasek Holdings,Triatomic CapitalTemasek HoldingsM12 - Microsoft's Venture Fund,Playground Global,SK Hynix
2025-11-12Series C· $275M
2023-09-06Series B· $110M
2022-04-20Series A· $44M
Recent News
2026-01-24
2026-01-22
2025-12-22
Company data provided by crunchbase