Gridmatic · 2 months ago
ML Performance Engineer
Gridmatic Inc. is a high-growth startup focused on accelerating the clean energy transition through expertise in data and machine learning. The ML Performance Engineer will build and optimize the infrastructure of the ML platform, enhance the efficiency of machine learning models, and mentor junior engineers within a collaborative team environment.
Artificial Intelligence (AI)Clean EnergyEnergy
Responsibilities
Own a significant piece of our ML platform while rapidly building and iterating scalable, robust distributed infrastructure for ML training, inference, and evaluation on large-scale time-series and weather datasets
Optimize throughput and cost by supporting model training and deployment across multiple clusters and clouds
Improve the efficiency of machine learning models and other workloads by optimizing latency, throughput, and memory consumption. This involves pushing the boundaries of current hardware capabilities through techniques like GPU performance engineering
Help define the long-term vision for Gridmatic’s ML platform
Play a key role in mentoring junior engineers and interns, contributing to a collaborative, innovative, and growth-oriented team culture
Qualification
Required
3+ years of experience in engineering with a commitment to technical excellence
Deep understanding of codebases and ability to write readable, scalable code
Experience in researching and implementing deep learning models
Experience in distributed training and inference of large models on GPU clusters, utilizing core libraries and frameworks such as PyTorch, PyTorch Lightning, and Ray
Comfortable with large-scale data storage infrastructure and formats, e.g. Zarr, SQL, and feature stores
Self-starter with a strong sense of independence and ownership, capable of engineering large, robust systems from initial design to productionization
Mission-driven individual enthusiastic about working toward a renewable grid and the intersection of ML and energy
Preferred
End to end proficiency in building, maintaining, and debugging cluster infrastructure, utilizing Kubernetes and Terraform
Expertise in identifying performance bottlenecks and designing and writing high-performance code for large-scale ML workloads
Experience with at least one of: torch.profiler, TorchDynamo, TorchInductor, Triton, or other deep learning compiler stacks
Knowledge of cluster communication protocols such as nccl or gloo
Experience working with any of the following: weather data, energy systems, time-series forecasting, electricity markets, or financial trading
Company
Gridmatic
Gridmatic is an AI-enabled power marketer that accelerates the ascent of clean energy.
H1B Sponsorship
Gridmatic has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
Funding
Current Stage
Growth StageTotal Funding
$46M2023-01-01Undisclosed· $40M
2021-08-01Undisclosed· $6M
Recent News
Renewable Energy Magazine
2025-07-25
Company data provided by crunchbase