SIGN IN
AI Production Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Meta · 1 day ago

AI Production Engineer

Meta builds technologies that help people connect, find communities, and grow businesses. As an AI Production Engineer on the AI Transformation team, you will develop and scale production-grade AI systems that enhance executive productivity, focusing on writing high-quality code, designing resilient systems, and building automation for AI reliability.
Computer Software
check
Comp. & Benefits

Responsibilities

Design and implement production-grade AI/ML systems for executive productivity, including LLMs, RAG systems, agents, inference pipelines, and MLOps infrastructure
Write and review code, develop documentation and capacity plans, and debug the hardest problems, live, on complex AI systems serving executive leadership
Build automation, self-healing systems, and CI/CD pipelines to minimize manual intervention and operational toil
Own AI infrastructure—training, inference, data pipelines, and GPU fleet management—across cloud platforms (AWS, Azure, GCP) and Kubernetes
Set technical direction, lead design reviews, mentor engineers, and advise leadership on AI technology trends and trade-offs
Share an on-call rotation (~1 week per quarter) and serve as an escalation contact for critical AI system incidents
Champion reliability by design—building resilience into systems from the start with circuit breakers, fallbacks, and graceful degradation
Travel globally up to 20% of the year to engage with executive partners and scale business opportunities

Qualification

AI/ML systemsKubernetesCloud platformsLinux/UnixCoding PythonCoding GoCoding C++Coding JavaCoding RustInfrastructure applicationsObservability toolsCapacity planningTechnical leadershipMentoring

Required

7+ years of experience in Linux/Unix and network fundamentals
7+ years of coding experience in an industry-standard language (e.g., Python, Go, C++, Java, Rust)
Experience with Internet service architecture, capacity planning, and handling needs for urgent capacity augmentation
Knowledge of common web technologies and Internet service architectures (CDN, load balancing, distributed systems)
Experience configuring and running infrastructure-level applications such as Kubernetes, Terraform, and cloud platforms (AWS, Azure, GCP)
Experience building and productionizing AI/ML systems, including LLMs, RAG architectures, inference optimization, and MLOps
Proven track record of leading complex technical initiatives and mentoring other engineers

Preferred

Experience with GPU infrastructure, ML accelerators, and model serving at scale
Familiarity with observability tools (Prometheus, Grafana, Datadog) and database/caching technologies (MySQL, Redis, Memcached)

Benefits

Bonus
Equity
Benefits

Company

Meta's mission is to build the future of human connection and the technology that makes it possible.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Kathryn Glickman
Director, CEO Communications
linkedin
leader-logo
Christine Lu
CTO Business Engineering NA
linkedin
Company data provided by crunchbase