The Walt Disney Company · 5 days ago
Lead Software Engineer - AI Operations and Tooling
The Walt Disney Company is a global leader in media and entertainment technology, seeking a Lead Engineer to establish and guide their AI Operations and Tooling practice. This role focuses on enabling AI-specific operations and improving the reliability and efficiency of AI applications across major cloud platforms.
Amusement Park and ArcadeAnimationConsumer GoodsDigital MediaE-CommerceMedia and EntertainmentMulti-level MarketingPerforming ArtsResorts
Responsibilities
Define frameworks for AI-specific operations: hallucination/quality testing, evaluation pipelines, and continuous validation
Establish reference patterns for scaling LLM services, prompt orchestration, and multi-agent workloads
Build automation for safe rollout, monitoring, and incident response
Implement end-to-end observability: latency, drift, failure modes, hallucination rates, and GPU/compute utilization
Drive cost optimization and efficiency across AI cloud usage (AWS, Azure, GCP)
Define SLOs, dashboards, and runbooks for AI/LLM production systems
Embed compliance, safety checks, and prompt-injection defenses into operational frameworks
Partner with security and governance teams to enforce enterprise-grade auditability and policy enforcement
Mentor engineers in DevOps, infra, and AI operations
Drive adoption of best practices for AI reliability, test automation, and incident management
Collaborate across AI Core, Data Foundations, Security, and Product teams to ensure operational safety and scale
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or related technical field (Master's preferred), or equivalent experience
7+ years of experience in software engineering, DevOps, or infrastructure, with at least 2 years in a lead role
Expert in at least one foundational language (Python, Java, or Go) with production-grade system experience
Hands-on experience with cloud-native infrastructure (AWS preferred; Azure/GCP a plus) and modern orchestration platforms
Proven experience with observability stacks (Datadog, Prometheus, Grafana) and incident response automation
Familiarity with AI/LLM APIs (OpenAI, Anthropic, Bedrock, Azure AI Foundry) and orchestration frameworks (LangChain, LangGraph)
Strong knowledge of operational AI testing (A/B evaluation, regression, red-teaming) and guardrail enforcement
Demonstrated ability to optimize cloud/GPU usage and manage costs at scale
Excellent communication skills and proven ability to lead design reviews, mentor engineers, and influence cross-functional teams
Preferred
Experience with AI-focused evaluation frameworks (LangSmith, PromptLayer, etc.)
Prior work in AI operations, SRE, or ML platform DevOps roles
Knowledge of multi-agent orchestration patterns and operational reliability for AI systems
Strong background in test automation and continuous validation for distributed systems
Skilled at incident review (RCA) and driving operational excellence across large-scale environments
Benefits
A bonus and/or long-term incentive units may be provided as part of the compensation package
Full range of medical, financial, and/or other benefits
Company
The Walt Disney Company
The Walt Disney Company started as a cartoon studio and evolves into sports coverage and television shows.
H1B Sponsorship
The Walt Disney Company has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (83)
2024 (63)
2023 (96)
2022 (130)
2021 (30)
2020 (40)
Funding
Current Stage
Public CompanyTotal Funding
$11BKey Investors
Citibank
2020-04-13Post Ipo Debt· $5B
2020-03-20Post Ipo Debt· $6B
1978-01-06IPO
Leadership Team
Recent News
2026-01-08
2026-01-07
Company data provided by crunchbase