ML Performance Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Gridmatic · 2 months ago

ML Performance Engineer

Gridmatic Inc. is a high-growth startup focused on accelerating the clean energy transition through expertise in data and machine learning. The ML Performance Engineer will build and optimize the infrastructure of the ML platform, enhance the efficiency of machine learning models, and mentor junior engineers within a collaborative team environment.

Artificial Intelligence (AI)Clean EnergyEnergy
check
H1B Sponsor Likelynote

Responsibilities

Own a significant piece of our ML platform while rapidly building and iterating scalable, robust distributed infrastructure for ML training, inference, and evaluation on large-scale time-series and weather datasets
Optimize throughput and cost by supporting model training and deployment across multiple clusters and clouds
Improve the efficiency of machine learning models and other workloads by optimizing latency, throughput, and memory consumption. This involves pushing the boundaries of current hardware capabilities through techniques like GPU performance engineering
Help define the long-term vision for Gridmatic’s ML platform
Play a key role in mentoring junior engineers and interns, contributing to a collaborative, innovative, and growth-oriented team culture

Qualification

Machine LearningDistributed SystemsGPU Performance EngineeringDeep LearningPyTorchData Storage InfrastructureKubernetesTerraformPerformance BottlenecksTime-Series Forecasting

Required

3+ years of experience in engineering with a commitment to technical excellence
Deep understanding of codebases and ability to write readable, scalable code
Experience in researching and implementing deep learning models
Experience in distributed training and inference of large models on GPU clusters, utilizing core libraries and frameworks such as PyTorch, PyTorch Lightning, and Ray
Comfortable with large-scale data storage infrastructure and formats, e.g. Zarr, SQL, and feature stores
Self-starter with a strong sense of independence and ownership, capable of engineering large, robust systems from initial design to productionization
Mission-driven individual enthusiastic about working toward a renewable grid and the intersection of ML and energy

Preferred

End to end proficiency in building, maintaining, and debugging cluster infrastructure, utilizing Kubernetes and Terraform
Expertise in identifying performance bottlenecks and designing and writing high-performance code for large-scale ML workloads
Experience with at least one of: torch.profiler, TorchDynamo, TorchInductor, Triton, or other deep learning compiler stacks
Knowledge of cluster communication protocols such as nccl or gloo
Experience working with any of the following: weather data, energy systems, time-series forecasting, electricity markets, or financial trading

Company

Gridmatic

twittertwitter
company-logo
Gridmatic is an AI-enabled power marketer that accelerates the ascent of clean energy.

H1B Sponsorship

Gridmatic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)

Funding

Current Stage
Growth Stage
Total Funding
$46M
2023-01-01Undisclosed· $40M
2021-08-01Undisclosed· $6M

Leadership Team

leader-logo
Matt Wytock
Founder
linkedin
Company data provided by crunchbase