Engineering Manager - Model Performance jobs in United States
cer-icon
Apply on Employer Site
company-logo

Baseten · 8 hours ago

Engineering Manager - Model Performance

Baseten is a rapidly growing company that powers mission-critical inference for dynamic AI companies. They are seeking an Engineering Manager focused on ML performance and inference to lead a team of engineers while remaining hands-on with technology.

Artificial Intelligence (AI)Developer ToolsMachine LearningSoftwareSoftware Engineering
check
H1B Sponsor Likelynote

Responsibilities

Lead, mentor, and manage a team of engineers focused on developing and optimizing ML model inference and performance
Oversee technical strategy and architecture decisions, driving improvements across our engineering organization
Collaborate with cross-functional teams to ensure seamless integration and scalability of ML models in production environments
Dive into the codebase of frameworks like TensorRT, PyTorch, CUDA, and others to identify and solve complex performance bottlenecks
Drive the development and deployment of large-scale optimization techniques for various ML models, especially large language models (LLMs)
Own the full lifecycle of projects from inception through delivery, including planning, execution, and resource management
Foster a collaborative, inclusive team environment that encourages continuous learning and growth

Qualification

ML model performance optimizationProgramming languagesContainerization (Docker)Orchestration systems (Kubernetes)Technical leadershipProduction-level AI/ML solutionsLarge language models (LLMs)Team managementCollaborationProject management

Required

Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, or a related field
5+ years of professional experience in software engineering, with at least 2 years in a technical leadership role
Proven experience managing and mentoring teams of engineers
Expertise in one or more programming languages, such as Python, C++, or Go
In-depth understanding of ML model performance optimization, especially using libraries such as PyTorch, TensorRT, and CUDA
Strong knowledge of containerization (Docker) and orchestration systems (Kubernetes)
Experience with production-level AI/ML solutions, including scaling and deploying large models
Ability to balance hands-on technical work with team leadership and project management

Preferred

Experience enhancing the performance of large language models (LLMs) or similar AI systems
Familiarity with LLM optimization techniques such as quantization, speculative decoding, or continuous batching
Deep knowledge of GPU architecture and performance tuning
Previous experience in a high-growth startup environment

Benefits

Competitive compensation, including meaningful equity.
100% coverage of medical, dental, and vision insurance for employee and dependents
Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
Paid parental leave
Company-facilitated 401(k)
Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Company

Baseten

twittertwittertwitter
company-logo
Baseten is an AI infrastructure company that integrates machine learning into business operations, production, and processes.

H1B Sponsorship

Baseten has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (6)
2024 (8)
2023 (1)
2020 (1)

Funding

Current Stage
Late Stage
Total Funding
$285M
Key Investors
BondGreylock
2025-09-05Series D· $150M
2025-02-19Series C· $75M
2024-03-04Series B· $40M

Leadership Team

leader-logo
Aaron Relph
Design
linkedin
Company data provided by crunchbase