Apply on Employer Site

Lemurian Labs · 2 weeks ago

Senior ML Performance Engineer

United States

Full-time

Remote

Senior Level

7+ years exp

Lemurian Labs is on a mission to bring the power of AI to everyone while ensuring sustainability. The Senior ML Performance Engineer will architect and lead the Performance Testing Platform, focusing on measuring and optimizing the performance of large language models on modern GPU architectures.

Artificial Intelligence (AI)Cloud ManagementInfrastructureMachine Learning

H1B Sponsor Likely

Responsibilities

Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters

Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy

Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements

Develop automated testing pipelines for continuous performance validation across compiler releases and model updates

Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations

Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins

Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow

Document best practices for performance testing and optimization of ML workloads on GPU hardware

Qualification

Performance engineeringGPU programmingML inference workloadsBenchmarking platformsPythonC/C++ML frameworksProfiling toolsCI/CD systemsAnalytical skillsPassion for sustainabilityCollaborationCommunicationAttention to detailSelf-driven

Required

7+ years of experience in performance engineering, benchmarking, or systems engineering roles

Deep understanding of ML inference workloads, particularly transformer-based models and LLMs

Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)

Strong programming skills in Python and C/C++

Proven track record of building performance testing infrastructure or benchmarking platforms from scratch

Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)

Proficiency with profiling and debugging tools for GPU workloads

Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly

Experience with CI/CD systems and test automation frameworks

Preferred

Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem

Knowledge of compiler optimization techniques and their impact on performance

Experience with distributed inference and multi-GPU workloads

Familiarity with ML model quantization, pruning, and other optimization techniques

Background in high-performance computing or systems-level optimization

Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)

Contributions to open-source ML or systems projects

Benefits

Equity

Company bonus opportunities

Medical

Dental

And vision benefits

Retirement savings plan

And supplemental wellness benefits

Company

Lemurian Labs

Any workload. Any hardware. Any scale.

Founded in 2018

Santa Clara, California, USA

11-50 employees

https://www.lemurianlabs.com

H1B Sponsorship

Lemurian Labs has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

Funding

Current Stage

Early Stage

Total Funding

$43.14M

Key Investors

Oval Park CapitalSilicon CatalystventureLAB

2025-12-03Series A· $28M

2024-10-09Convertible Note· $6M

2023-09-08Seed· $9M

Leadership Team

Jay Dawani

Co-Founder & CEO

Recent News

alleywatch.com

The Weekly Notable Startup Funding Report: 12/8/25

2025-12-09

Pulse 2.0

Lemurian Labs: $28 Million Series A Raised To Accelerate Hardware Agnostic AI

2025-12-09

SuperbCrew

Lemurian Labs Raises $28M In Series A Funding Round

2025-12-08

Company data provided by crunchbase