Apply on Employer Site

Perplexity · 16 hours ago

Inference Engineering Manager

San Francisco

Full-time

Onsite

Senior Level

$300K/yr - $385K/yr

5+ years exp

Perplexity is seeking an Inference Engineering Manager to lead their AI Inference team, responsible for building and scaling the infrastructure that powers their products and APIs. The role involves owning the technical direction of inference systems and leading a team of engineers to develop reliable and efficient AI capabilities.

Artificial Intelligence (AI)ChatbotMachine LearningNatural Language ProcessingSearch Engine

H1B Sponsor Likely

Responsibilities

Lead and grow a high-performing team of AI inference engineers

Develop APIs for AI inference used by both internal and external customers

Architect and scale our inference infrastructure for reliability and efficiency

Benchmark and eliminate bottlenecks throughout our inference stack

Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models

Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc

Improve the reliability and observability of our systems and lead incident response

Own technical decisions around batching, throughput, latency, and GPU utilization

Partner with ML research teams on model optimization and deployment

Recruit, mentor, and develop engineering talent

Establish team processes, engineering standards, and operational excellence

Qualification

ML systemsInference frameworksTechnical leadershipLLM architectureInference optimizationsDistributed systemsParallelism strategiesGPU characteristicsPerformance analysisKubernetesContainer orchestrationTechnical communicationCross-functional collaboration

Required

5+ years of engineering experience with 2+ years in a technical leadership or management role

Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)

Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers

Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention

Familiarity with GPU characteristics, roofline models, and performance analysis

Experience deploying reliable, distributed, real-time systems at scale

Track record of building and leading high-performing engineering teams

Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism

Strong technical communication and cross-functional collaboration skills

Preferred

Experience with CUDA, Triton, or custom kernel development

Background in training infrastructure and RL workloads

Experience with Kubernetes and container orchestration at scale

Published work or contributions to inference optimization research

Company

Perplexity

Perplexity is an AI-powered answer engine designed to provide accurate, real-time responses to user queries.

Founded in 2022

San Francisco, California, USA

201-500 employees

https://www.perplexity.ai

H1B Sponsorship

Perplexity has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (12)

2024 (7)

2023 (2)

Funding

Current Stage

Late Stage

Total Funding

$1.48B

Key Investors

Cristiano RonaldoNuVenturesAccel

2025-12-05Undisclosed

2025-09-10Series Unknown· $200M

2025-08-15Secondary Market

Leadership Team

Aravind Srinivas

Cofounder, President, CEO

Denis Yarats

Co-Founder & CTO

Recent News

Decrypt

Wikipedia Reveals Multiple Deals with AI Giants to Use Its Content

2026-01-17

TechRadar.com

Microsoft, Meta, and Amazon are paying up for ‘enterprise’ access to Wikipedia

2026-01-17

Media Nama

Wikipedia Licenses AI Access to Microsoft, Meta and Others Amid Rising Costs and Falling Traffic

2026-01-17

Company data provided by crunchbase