Software Engineer - Data Infra Reliability jobs in United States
cer-icon
Apply on Employer Site
company-logo

Luma AI · 1 day ago

Software Engineer - Data Infra Reliability

Luma AI is on a mission to build multimodal AI systems that expand human imagination and capabilities. They are seeking a Data Reliability Engineer who will be responsible for the resilience, automation, and scalability of their petabyte-scale data infrastructure, ensuring high availability for research jobs through innovative automation and reliability practices.

Artificial Intelligence (AI)Generative AIVideoVideo Editing
check
H1B Sponsor Likelynote

Responsibilities

Automate Everything: Apply Infrastructure-as-Code (IaC) principles using Terraform to provision, manage, and scale our data infrastructure
Harden Data Pipelines: Build reliability and fault tolerance into our core data ingestion and processing workflows, ensuring high availability for research jobs
Scale Kubernetes & Ray: Operate and optimize large-scale Kubernetes clusters and Ray deployments to handle bursty, high-throughput workloads
Define Reliability: Establish Service Level Objectives (SLOs) and observability standards (Prometheus/Grafana) for our data platforms
Debug & Heal: serve as the first line of defense for complex infrastructure failures, diagnosing root causes in distributed storage and compute systems

Qualification

Infrastructure-as-CodeKubernetesPythonData Reliability EngineeringTerraformSRE/DevOpsHigh-throughput storageAutomationNetworking

Required

Deep SRE/DevOps proficiency: You live and breathe Linux, networking, and automation
Infrastructure-as-Code Native: You have extensive experience with Terraform, Ansible, or similar tools to manage complex cloud environments (AWS/GCP)
Kubernetes Expert: You have managed Kubernetes in production and understand its internals, not just how to deploy containers
Python Proficiency: You can write high-quality Python code for automation, tooling, and infrastructure management
Data-Minded: You understand the specific challenges of stateful data systems and high-throughput storage (S3/Object Store)

Preferred

Experience managing GPU clusters or AI/ML workloads
Background in both Software Engineering and Operations (DevOps)
Experience with high-performance networking (InfiniBand/RDMA)

Company

Luma AI

twittertwittertwitter
company-logo
Luma AI develops tools that let users generate photorealistic images and videos from text, image, or video prompts.

H1B Sponsorship

Luma AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (3)

Funding

Current Stage
Growth Stage
Total Funding
$1.06B
Key Investors
HUMAINAndreessen HorowitzAmplify Partners
2025-11-19Series C· $900M
2024-12-06Series B· $90M
2024-01-09Series B· $43M

Leadership Team

leader-logo
Amit Jain
Co-Founder
linkedin
Company data provided by crunchbase