Principal AI Operations Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Microsoft · 8 hours ago

Principal AI Operations Engineer

Microsoft is a leading technology company committed to making the world a safer place through innovative security solutions. The Principal AI Operations Engineer role involves defining the technical direction and operational standards for the AI Operations group, focusing on ensuring reliability, managing production health, and driving operational excellence.

Agentic AIApplication Performance ManagementArtificial Intelligence (AI)Business DevelopmentDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Define the operational vision, standards, and roadmap for the platform; establish SLOs, error budgets, and reliability targets
Drive technical direction for the AI Operations group: architecture for deployments, pipelines, branch health, and production reliability
Own CI/CD pipeline architecture: Azure DevOps/GitHub Actions pipelines, build optimization, artifact management, and deployment automation
Manage Kubernetes infrastructure: AKS cluster operations, Helm chart management, node pool configuration, GPU resource allocation, and autoscaling (KEDA)
Drive production deployments: canary/ring rollouts, safe deployment practices, rollback procedures, and release coordination with Platform team
Establish and operate first-level on-call: incident response procedures, escalation paths, runbooks, and post-incident reviews
Build and maintain observability infrastructure: Prometheus, Grafana, OpenTelemetry collectors, alerting rules, and dashboard curation
Manage infrastructure as code: Bicep templates for Azure resources, Helm charts for Kubernetes deployments, and environment parity
Ensure branch health and code quality gates: PR validation pipelines, automated testing, security scanning, and merge policies
Debug and diagnose production issues: analyze logs (Kusto/ADX), traces, and metrics to identify root causes and drive resolution
Collaborate with Platform team on operational readiness: review service designs for operability, define deployment requirements, and validate runbooks
Drive reliability improvements: capacity planning, performance optimization, chaos engineering, and disaster recovery testing
Guide and mentor operations engineers; establish operational effective practices and continuous improvement culture
Embody our culture and values

Qualification

KubernetesCI/CD pipelinesDevOpsAzure DevOpsInfrastructure as CodeObservability toolingCloud platformsCC++C#JavaJavaScriptPython

Required

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
6+ years technical engineering experience in DevOps, SRE, or platform operations
6+ years driving complex operational initiatives across teams; demonstrated success leading without authority
4+ years hands-on experience with Kubernetes in production environments
3+ years building and maintaining CI/CD pipelines at scale
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role

Preferred

Experienced with Kubernetes: cluster operations, Helm, troubleshooting, autoscaling, and production management
Proficiency with CI/CD platforms: Azure DevOps, GitHub Actions, or similar pipeline tooling
Experience with cloud platforms (Azure preferred): AKS, networking, identity management, and resource provisioning
Infrastructure as Code: Bicep, Terraform, or Helm chart development
Observability tooling: Prometheus, Grafana, OpenTelemetry, and log analytics (Kusto/KQL)

Company

Microsoft

company-logo
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.

H1B Sponsorship

Microsoft has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9192)
2024 (9343)
2023 (7677)
2022 (11403)
2021 (7210)
2020 (7852)

Funding

Current Stage
Public Company
Total Funding
$1M
Key Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M

Leadership Team

leader-logo
Satya Nadella
Chairman and CEO
linkedin
leader-logo
Vukani Mngxati
Chief Executive Officer - Microsft South Africa
linkedin
Company data provided by crunchbase