Modular · 22 hours ago
Senior AI Kernel Engineer
Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack from the ground up. The Senior AI Kernel Engineer will lead the design and optimization of high-performance kernels for large-scale AI inference on GPUs, collaborating with various teams to ensure efficient implementations that run at scale.
AI InfrastructureArtificial Intelligence (AI)Generative AIMachine LearningSoftware
Responsibilities
Design, implement, and optimize performance-critical kernels for AI inference workloads (e.g., GEMM, attention, communication, fusion)
Lead kernel-level optimization efforts across single-GPU, multi-GPU, and heterogeneous hardware environments
Make informed trade-offs between latency, throughput, memory footprint, and numerical precision
Drive adoption of new hardware features (e.g., Tensor Cores, asynchronous execution, advanced memory spaces)
Analyze performance using profilers, hardware counters, and microbenchmarks; translate insights into concrete improvements
Work closely with compiler and runtime teams to influence code generation, scheduling, and kernel fusion strategies
Review and mentor other engineers on kernel design, performance tuning, and best practices
Contribute to technical roadmaps and long-term performance strategy for AI inference
Qualification
Required
5+ years of experience in performance-critical systems or kernel development (or equivalent depth of expertise)
Strong proficiency in C/C++ and low-level programming
Extensive hands-on experience with GPU kernel programming (CUDA, HIP, or equivalent)
Deep understanding of GPU architecture, including memory hierarchies, synchronization, and execution models
Proven track record of delivering measurable performance improvements in production systems
Strong problem-solving skills and ability to work independently on complex, ambiguous performance challenges
Preferred
Experience with PTX, assembly-level tuning, or code generation frameworks (e.g., Triton)
Experience optimizing distributed or multi-GPU inference pipelines
Familiarity with custom AI accelerators or domain-specific hardware
Understanding of modern AI models (e.g., transformers, LLMs, diffusion) from a systems and performance perspective
Contributions to open-source kernel libraries, compilers, or performance tools
Experience collaborating directly with hardware or compiler teams
Benefits
Premier insurance plans
Up to 5% 401k matching
Flexible paid time off
Stock options
Company
Modular
Modular provides AI infrastructure for deployment, serving, and programming GPUs.
H1B Sponsorship
Modular has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (6)
2023 (8)
2022 (4)
Funding
Current Stage
Growth StageTotal Funding
$380MKey Investors
US Innovative Technology FundGeneral CatalystGoogle Ventures
2025-09-24Series C· $250M
2023-08-24Series B· $100M
2022-06-30Seed· $30M
Recent News
General Catalyst
2026-01-14
General Catalyst
2026-01-14
Greylock
2025-12-29
Company data provided by crunchbase