Senior ML Software Engineer - Quantization & Numerics jobs in United States
cer-icon
Apply on Employer Site
company-logo

Microsoft · 2 weeks ago

Senior ML Software Engineer - Quantization & Numerics

Microsoft is a leading technology company focused on empowering individuals and organizations to achieve more. They are seeking a Senior ML Software Engineer to design and develop quantization and numerics kernels for efficient deployment of LLM inference and training in Azure production environments, collaborating with various teams to optimize performance and ensure system efficiency.

Agentic AIApplication Performance ManagementArtificial Intelligence (AI)Business DevelopmentDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design and develop novel quantization and numerics kernels to enable efficient deployment of LLM inference and training in Microsoft’s Azure production environments
Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models
Analyze performance bottlenecks in quantized state-of-the-art LLM architectures and drive performance improvements
Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations on novel hardware accelerator SDK
Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI/ML teams
Work cross-functionally with data scientists and ML researchers/engineers across organizations to align on model accuracy and performance goals
Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency

Qualification

High-performance ML systemsGPU kernel developmentModel optimizationDeep learning frameworksLow-precision numericsTransformer architecturesProgramming in PythonProgramming in C/C++CommunicationTeam collaboration

Required

Bachelor's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 4+ years of industry experience in high-performance ML systems, GPU kernel development, or ML runtime/infrastructure development OR Master's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 3+ years of industry experience in high-performance ML systems, GPU kernel development, or ML runtime/infrastructure development OR Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) of industry experience in high-performance ML systems, GPU kernel development, or ML runtime/infrastructure development
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Preferred

Demonstrated experience delivering production-grade software in areas such as model compression, low-precision numerics (FP8, INT8/4, NVFP4, MX formats, etc.), low-level kernel development, and performance optimization
Proficiency with modern deep learning frameworks, including PyTorch, TensorFlow, TensorRT, and ONNX Runtime
Expertise in GPU/NPU kernel development using CUDA, Triton, ROCm, or comparable frameworks and fast model bring up on a new stack
Strong understanding of Transformer and LLM architectures, with hands-on experience in optimization techniques such as quantization, pruning, tensor/parameter sharding, model parallelism, KV-cache optimization, and Flash Attention etc
Practical experience with large-scale model evaluation, including benchmarking state-of-the-art LLMs and fine-tuning (SFT or RL) large models
Solid programming skills in Python, C, and C++
Excellent communication abilities and a proven capacity to collaborate effectively in hybrid team-oriented environments
Hands-on experience implementing and optimizing low-level linear algebra routines, including custom BLAS kernels would be a plus
Deep knowledge of mixed-precision arithmetic units, including numerical formats and microarchitecture, is highly desirable

Company

Microsoft

company-logo
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.

H1B Sponsorship

Microsoft has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9192)
2024 (9343)
2023 (7677)
2022 (11403)
2021 (7210)
2020 (7852)

Funding

Current Stage
Public Company
Total Funding
$1M
Key Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M

Leadership Team

leader-logo
Satya Nadella
Chairman and CEO
linkedin
leader-logo
Vukani Mngxati
Chief Executive Officer - Microsft South Africa
linkedin
Company data provided by crunchbase