Apply on Employer Site

Scale AI · 1 day ago

AI Infrastructure Engineer, Core Infrastructure

Seattle, WA

Full-time

Onsite

Mid, Senior Level

$179K/yr - $310K/yr

4+ years exp

Scale AI is a company focused on developing reliable AI systems for critical decisions. They are seeking an AI Infrastructure Engineer to design and build foundational systems that power ML infrastructure, optimizing workloads across various compute environments and enhancing reliability and cost efficiency.

AI InfrastructureArtificial Intelligence (AI)Data Collection and LabelingGenerative AIImage RecognitionMachine Learning

H1B Sponsor Likely

Responsibilities

Design and maintain fault-tolerant, cost-efficient systems that manage compute allocation, scheduling, and autoscaling across clusters and clouds

Build common abstractions and APIs that unify job submission, telemetry, and observability across serving and training workloads

Develop systems for usage metering, cost attribution, and quota management, enabling transparency and control over compute budgets

Improve reliability and efficiency of large-scale GPU workloads through better scheduling, bin-packing, preemption, and resource sharing

Partner with ML engineers and API teams to identify bottlenecks and define long-term architectural standards

Lead projects end-to-end — from requirements gathering and design to rollout and monitoring — in a cross-functional environment

Qualification

Distributed systemsPythonKubernetesInfrastructure as CodeCost efficiencyWorkload managementObservability practicesGoRustContainersTelemetrySoft skills

Required

4+ years of experience building large-scale backend or distributed systems

Strong programming skills in Python, Go, or Rust, and familiarity with modern cloud-native architecture

Experience with containers and orchestration tools (Kubernetes, Docker) and Infrastructure as Code (Terraform)

Familiarity with schedulers or workload management systems (e.g., Kubernetes controllers, Slurm, Ray, internal job queues)

Understanding of observability and reliability practices (metrics, tracing, alerting, SLOs)

A track record of improving system efficiency, reliability, or developer velocity in production environments

Preferred

Experience with multi-tenant compute platforms or internal PaaS

Knowledge of GPU scheduling, cost modeling, or hybrid cloud orchestration

Familiarity with LLM or ML training workloads, though deep ML expertise is not required

Benefits

Comprehensive health, dental and vision coverage

Retirement benefits

A learning and development stipend

Generous PTO

A commuter stipend

Company

Scale AI

Scale’s mission is to develop reliable AI systems for the world’s most important decisions.

Founded in 2016

San Francisco, California, USA

501-1000 employees

https://scale.com

H1B Sponsorship

Scale AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (82)

2024 (54)

2023 (29)

2022 (17)

2021 (10)

2020 (10)

Funding

Current Stage

Late Stage

Total Funding

$15.9B

Key Investors

MetaAccelTiger Global Management

2025-06-10Corporate Round· $14.3B

2025-06-04Series Unknown

2024-05-21Series F· $1B

Leadership Team

Jason Droege

Interim Chief Executive Officer

Dennis Cinelli

Chief Financial Officer

Recent News

Crunchbase News

Global Venture Funding In 2025 Surged As Startup Deals And Valuations Set All-Time Records

2026-01-07

Benzinga.com

Former Meta Scientist Says Mark Zuckerberg's New AI Chief Is 'Young' And 'Inexperienced'—Warns 'Lot Of People' Who Haven't Yet Left Meta 'Will Leave'

2026-01-05

Crunchbase News

Crunchbase Predicts: Why Top VCs Expect More Venture Dollars, Bigger Rounds And Fewer Winners In 2026

2026-01-05

Company data provided by crunchbase