Lemurian Labs · 2 weeks ago
Senior ML Performance Engineer
Lemurian Labs is on a mission to bring the power of AI to everyone while ensuring sustainability. The Senior ML Performance Engineer will architect and lead the Performance Testing Platform, focusing on measuring and optimizing the performance of large language models on modern GPU architectures.
Artificial Intelligence (AI)Cloud ManagementInfrastructureMachine Learning
Responsibilities
Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
Document best practices for performance testing and optimization of ML workloads on GPU hardware
Qualification
Required
7+ years of experience in performance engineering, benchmarking, or systems engineering roles
Deep understanding of ML inference workloads, particularly transformer-based models and LLMs
Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven track record of building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)
Proficiency with profiling and debugging tools for GPU workloads
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
Experience with CI/CD systems and test automation frameworks
Preferred
Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem
Knowledge of compiler optimization techniques and their impact on performance
Experience with distributed inference and multi-GPU workloads
Familiarity with ML model quantization, pruning, and other optimization techniques
Background in high-performance computing or systems-level optimization
Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)
Contributions to open-source ML or systems projects
Benefits
Equity
Company bonus opportunities
Medical
Dental
And vision benefits
Retirement savings plan
And supplemental wellness benefits
Company
Lemurian Labs
Any workload. Any hardware. Any scale.
H1B Sponsorship
Lemurian Labs has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Early StageTotal Funding
$43.14MKey Investors
Oval Park CapitalSilicon CatalystventureLAB
2025-12-03Series A· $28M
2024-10-09Convertible Note· $6M
2023-09-08Seed· $9M
Recent News
alleywatch.com
2025-12-09
2025-12-08
Company data provided by crunchbase