Perplexity · 16 hours ago
Inference Engineering Manager
Perplexity is seeking an Inference Engineering Manager to lead their AI Inference team, responsible for building and scaling the infrastructure that powers their products and APIs. The role involves owning the technical direction of inference systems and leading a team of engineers to develop reliable and efficient AI capabilities.
Artificial Intelligence (AI)ChatbotMachine LearningNatural Language ProcessingSearch Engine
Responsibilities
Lead and grow a high-performing team of AI inference engineers
Develop APIs for AI inference used by both internal and external customers
Architect and scale our inference infrastructure for reliability and efficiency
Benchmark and eliminate bottlenecks throughout our inference stack
Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc
Improve the reliability and observability of our systems and lead incident response
Own technical decisions around batching, throughput, latency, and GPU utilization
Partner with ML research teams on model optimization and deployment
Recruit, mentor, and develop engineering talent
Establish team processes, engineering standards, and operational excellence
Qualification
Required
5+ years of engineering experience with 2+ years in a technical leadership or management role
Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Track record of building and leading high-performing engineering teams
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Strong technical communication and cross-functional collaboration skills
Preferred
Experience with CUDA, Triton, or custom kernel development
Background in training infrastructure and RL workloads
Experience with Kubernetes and container orchestration at scale
Published work or contributions to inference optimization research
Company
Perplexity
Perplexity is an AI-powered answer engine designed to provide accurate, real-time responses to user queries.
H1B Sponsorship
Perplexity has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (7)
2023 (2)
Funding
Current Stage
Late StageTotal Funding
$1.48BKey Investors
Cristiano RonaldoNuVenturesAccel
2025-12-05Undisclosed
2025-09-10Series Unknown· $200M
2025-08-15Secondary Market
Recent News
2026-01-17
2026-01-17
Company data provided by crunchbase