Senior ML Performance Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Lemurian Labs · 2 weeks ago

Senior ML Performance Engineer

Lemurian Labs is on a mission to bring the power of AI to everyone while ensuring sustainability. The Senior ML Performance Engineer will architect and lead the Performance Testing Platform, focusing on measuring and optimizing the performance of large language models on modern GPU architectures.

Artificial Intelligence (AI)Cloud ManagementInfrastructureMachine Learning
check
H1B Sponsor Likelynote

Responsibilities

Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
Document best practices for performance testing and optimization of ML workloads on GPU hardware

Qualification

Performance engineeringGPU programmingML inference workloadsBenchmarking platformsPythonC/C++ML frameworksProfiling toolsCI/CD systemsAnalytical skillsPassion for sustainabilityCollaborationCommunicationAttention to detailSelf-driven

Required

7+ years of experience in performance engineering, benchmarking, or systems engineering roles
Deep understanding of ML inference workloads, particularly transformer-based models and LLMs
Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven track record of building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)
Proficiency with profiling and debugging tools for GPU workloads
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
Experience with CI/CD systems and test automation frameworks

Preferred

Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem
Knowledge of compiler optimization techniques and their impact on performance
Experience with distributed inference and multi-GPU workloads
Familiarity with ML model quantization, pruning, and other optimization techniques
Background in high-performance computing or systems-level optimization
Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)
Contributions to open-source ML or systems projects

Benefits

Equity
Company bonus opportunities
Medical
Dental
And vision benefits
Retirement savings plan
And supplemental wellness benefits

Company

Lemurian Labs

twittertwitter
company-logo
Any workload. Any hardware. Any scale.

H1B Sponsorship

Lemurian Labs has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$43.14M
Key Investors
Oval Park CapitalSilicon CatalystventureLAB
2025-12-03Series A· $28M
2024-10-09Convertible Note· $6M
2023-09-08Seed· $9M

Leadership Team

leader-logo
Jay Dawani
Co-Founder & CEO
linkedin
Company data provided by crunchbase