Apply on Employer Site

Together AI · 5 hours ago

Senior Backend Engineer, Inference Platform

United States

Full-time

Remote

Senior Level

$160K/yr - $250K/yr

5+ years exp

Together AI is building the Inference Platform that brings the most advanced generative AI models to the world. The role involves optimizing latency and collaborating with researchers to bring new model architectures into production, while also contributing to the open source community.

AI InfrastructureArtificial Intelligence (AI)Generative AIInternetIT InfrastructureOpen Source

H1B Sponsor Likely

Responsibilities

Build and optimize global and local request routing, ensuring low-latency load balancing across data centers and model engine pods

Develop auto-scaling systems to dynamically allocate resources and meet strict SLOs across dozens of data centers

Design systems for multi-tenant traffic shaping, tuning both resource allocation and request handling — including smart rate limiting and regulation — to ensure fairness and consistent experience across all users

Engineer trade-offs between latency and throughput to serve diverse workloads efficiently

Optimize prefix caching to reduce model compute and speed up responses

Collaborate with ML researchers to bring new model architectures into production at scale

Continuously profile and analyze system-level performance to identify bottlenecks and implement optimizations

Qualification

Distributed systemsAPI microservicesLow-level OS conceptsExpert programmingKubernetesGPU software stacksModern LLMsOpen source experienceCollaborationProblem-solving

Required

5+ years of demonstrated experience building large-scale, fault-tolerant, distributed systems and API microservices

Strong background in designing, analyzing, and improving efficiency, scalability, and stability of complex systems

Excellent understanding of low-level OS concepts: multi-threading, memory management, networking, and storage performance

Expert-level programming in one or more of: Rust, Go, Python, or TypeScript

Bachelor's or Master's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience

Preferred

Knowledge of modern LLMs and generative models and how they are served in production is a plus

Experience working with the open source ecosystem around inference is highly valuable; familiarity with SGLang, vLLM, or NVIDIA Dynamo will be especially handy

Experience with Kubernetes or container orchestration is a strong plus

Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand, NVLink, MPI) is a plus

Benefits

Competitive compensation

Startup equity

Health insurance

Other competitive benefits

Company

Together AI

Together AI is a cloud-based platform designed for constructing open-source generative AI and infrastructure for developing AI models.

Founded in 2022

San Francisco, California, USA

201-500 employees

https://www.together.ai

H1B Sponsorship

Together AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (19)

2024 (6)

2023 (3)

Funding

Current Stage

Growth Stage

Total Funding

$533.5M

Key Investors

Alumni Ventures,General Catalyst,Prosperity7 VenturesSalesforce VenturesKleiner Perkins,NVIDIA

2025-02-20Series B· $305M

2024-03-13Series A· $106M

2023-11-29Series A· $102.5M

Leadership Team

Vipul Ved Prakash

Co-Founder & CEO

Kae Ike Lim

Executive Assistant to Co-Founder and CEO

Recent News

Dynamic Business

Together AI Revolutionizes Open-Source AI Economy

2026-01-20

Morningstar.com

AI21 Labs and Together AI Partner to Expand Access to Open-Source Models

2025-11-27

prnasia.com

PEGATRON Strengthens AI Infrastructure Collaboration with Together AI and 5C for NVIDIA GB300 NVL72 and NVIDIA HGX B200 Liquid-Cooled Rack Deployment in U.S. Data Centers

2025-11-19

Company data provided by crunchbase