Inference Engineering Manager jobs in United States
cer-icon
Apply on Employer Site
company-logo

Perplexity · 10 hours ago

Inference Engineering Manager

Perplexity is seeking an Inference Engineering Manager to lead their AI Inference team, focusing on building and scaling infrastructure for their products and APIs. The role involves owning the technical direction of inference systems and leading a team of engineers to enhance AI capabilities.

Artificial Intelligence (AI)ChatbotMachine LearningNatural Language ProcessingSearch Engine
check
H1B Sponsor Likelynote

Responsibilities

Lead and grow a high-performing team of AI inference engineers
Develop APIs for AI inference used by both internal and external customers
Architect and scale our inference infrastructure for reliability and efficiency
Benchmark and eliminate bottlenecks throughout our inference stack
Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc
Improve the reliability and observability of our systems and lead incident response
Own technical decisions around batching, throughput, latency, and GPU utilization
Partner with ML research teams on model optimization and deployment
Recruit, mentor, and develop engineering talent
Establish team processes, engineering standards, and operational excellence

Qualification

PythonPyTorchML systemsTechnical leadershipInference optimizationsKubernetesTeam buildingCross-functional collaborationTechnical communication

Required

5+ years of engineering experience with 2+ years in a technical leadership or management role
Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Track record of building and leading high-performing engineering teams
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Strong technical communication and cross-functional collaboration skills

Preferred

Experience with CUDA, Triton, or custom kernel development
Background in training infrastructure and RL workloads
Experience with Kubernetes and container orchestration at scale
Published work or contributions to inference optimization research

Company

Perplexity

twittertwittertwitter
company-logo
Perplexity is an AI-powered answer engine designed to provide accurate, real-time responses to user queries.

H1B Sponsorship

Perplexity has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (7)
2023 (2)

Funding

Current Stage
Late Stage
Total Funding
$1.48B
Key Investors
Cristiano RonaldoNuVenturesAccel
2025-12-05Undisclosed
2025-09-10Series Unknown· $200M
2025-08-15Secondary Market

Leadership Team

leader-logo
Aravind Srinivas
Cofounder, President, CEO
linkedin
leader-logo
Denis Yarats
Co-Founder & CTO
linkedin
Company data provided by crunchbase