Software Engineer - Model Performance jobs in United States
cer-icon
Apply on Employer Site
company-logo

Baseten · 3 days ago

Software Engineer - Model Performance

Baseten powers mission-critical inference for leading AI companies and is seeking a Software Engineer focused on ML performance. The role involves advancing AI applications by implementing and optimizing techniques for ML model inference, particularly in large language models.

AI InfrastructureArtificial Intelligence (AI)Developer ToolsMachine LearningSoftwareSoftware Engineering
check
H1B Sponsor Likelynote

Responsibilities

Implement, refine, and productionize cutting-edge techniques (quantization, speculative decoding, kv cache reuse, chunked prefill and LoRA) for ML model inference and infrastructure
Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to debug ML performance issues
Apply and scale optimization techniques across a wide range of ML models, particularly large language models
Collaborate with a diverse team to design and implement innovative solutions
Own projects from idea to production

Qualification

ML performance optimizationPythonPyTorchC++CUDALLM optimization techniquesSoftware engineering principlesDockerKubernetesGPU architecture

Required

Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field
Experience with one or more general-purpose programming languages, such as Python or C++
Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous batching)
Strong familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT-LLM
Demonstrated interest and experience in LLM's
Deep understanding of GPU architecture

Preferred

Proficiency in enhancing the performance of software systems, particularly in the context of large language models (LLMs)
Experience with CUDA or similar technologies
Deep understanding of software engineering principles and a proven track record of developing and deploying AI/ML inference solutions
Experience with Docker and Kubernetes

Benefits

Competitive compensation, including meaningful equity.
100% coverage of medical, dental, and vision insurance for employee and dependents
Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
Paid parental leave
Company-facilitated 401(k)
Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Company

Baseten

twittertwittertwitter
company-logo
Baseten is an AI infrastructure company that integrates machine learning into business operations, production, and processes.

H1B Sponsorship

Baseten has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (6)
2024 (8)
2023 (1)
2020 (1)

Funding

Current Stage
Late Stage
Total Funding
$285M
Key Investors
BondGreylock
2025-09-05Series D· $150M
2025-02-19Series C· $75M
2024-03-04Series B· $40M

Leadership Team

leader-logo
Aaron Relph
Design
linkedin
Company data provided by crunchbase