Together AI · 10 hours ago
Machine Learning Operations Lead
Together AI is building an advanced AI Inference & Model Shaping Platform. They are seeking an exceptional MLOps Engineering Lead to ensure excellence of their ML API offerings and to optimize operations for availability and reliability across services.
Artificial Intelligence (AI)Generative AIInternetIT InfrastructureOpen Source
Responsibilities
Own availability and performance SLAs for production inference and fine-tuning services across serverless and dedicated deployments
Own & improve testing, deployment, configuration management, and monitoring practices for multi-cluster ML infrastructure – partnering closely with Infra SREs
Build self-serve tooling and automation to reduce operational toil and enable internal users (MLOps, customer experience) and self-serve offerings
Define and enforce configuration best practices for inference engines (vLLM, tvLLM, Pulsar) to prevent runtime issues
Lead incident response, conduct postmortems, and drive reliability improvements
Hire, mentor, and grow an MLOps engineering team
Partner with infrastructure and ML engineering teams to improve system reliability and cost efficiency
Qualification
Required
5+ years operating production ML inference or training systems at scale
2+ years leading engineering teams, with experience building teams from scratch
Deep expertise with Kubernetes, multi-cluster orchestration, and ML serving frameworks
Strong track record owning production SLAs (e.g. availability, TTFT, TPS)
Experience with LLM inference serving systems (vLLM, TRT-LLM, or similar)
Ability to influence cross-functional teams and make deployment/architecture decisions
Preferred
Experience building internal developer platforms or self-serve tooling
Background in cost optimization for GPU infrastructure
Contributions to open-source ML infrastructure projects
Benefits
Health insurance
Equity
Other competitive benefits
Company
Together AI
Together AI is a cloud-based platform designed for constructing open-source generative AI and infrastructure for developing AI models.
H1B Sponsorship
Together AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (17)
2024 (6)
2023 (3)
Funding
Current Stage
Growth StageTotal Funding
$533.5MKey Investors
Salesforce VenturesLux Capital
2025-02-20Series B· $305M
2024-03-13Series A· $106M
2023-11-29Series A· $102.5M
Leadership Team
Recent News
2025-11-27
Company data provided by crunchbase