Kumo · 5 months ago
Software Engineer Lead - Cloud Infrastructure
Kumo is building the infrastructure layer for the next generation of enterprise AI, focusing on transforming data into predictive intelligence. The Lead Infrastructure Engineer will own the architecture and reliability of Kumo’s AI platform, leading critical systems design and mentoring engineers while actively participating in coding and production services.
Artificial Intelligence (AI)Big DataBusiness IntelligenceMachine LearningSaaS
Responsibilities
Set the technical vision and roadmap for Kumo’s multi-tenant infrastructure across AWS, Azure, and GCP, balancing scalability, reliability, cost, and security
Lead architecture and design for critical systems: Kubernetes-based multi-tenancy, real-time inference clusters, training pipelines, and CI/CD for large ML workloads
Hands-on implementation: build and evolve IaC, GitOps flows, cluster autoscaling, and automation that reduce toil and accelerate developer productivity
Define and drive SLOs, SLIs, and capacity planning; lead incident response, postmortems, and systemic remediation
Own cost optimization at scale — from resource scheduling to spot/commit strategies and cross-cloud lifecycle management
Mentor and grow engineers: set standards for architecture reviews, design docs, code quality, and operational excellence
Hire and help scale the team — participate in recruiting, interviewing, and onboarding top-tier infrastructure talent
Qualification
Required
5-8+ years building and operating production cloud-native infrastructure; proven track record leading infrastructure initiatives end-to-end
Deep, practical experience with Kubernetes at scale (multi-tenant environments, cluster federation, or large fleet operations)
Strong multi-cloud operational experience (designing and running services across AWS/Azure/GCP) and cloud cost management
Demonstrated systems design skills for distributed systems, making architectural trade-offs and comfortable shipping code in a high-velocity environment (Python, Go, or similar) and reviewing complex PRs
Proficiency in Go, Python, Rust or similar languages for automation tooling
Excellent communicator: able to influence across engineering, ML science, product, and leadership — and to write clear design docs and trade-off analyses
Preferred
Experience building infrastructure for ML/AI platforms or relational foundation models
Background with Spark or large-scale data processing platforms (managed or self-hosted)
Familiarity with Kubernetes operators, controllers, CRDs, or service mesh patterns
Expertise with Infrastructure-as-Code (Terraform/Pulumi) and GitOps (ArgoCD, Flux, Argo Workflows) in production
Experience with tenant isolation, zero-trust identity models, and cloud security/compliance frameworks
Prior experience building and scaling an infrastructure team (e.g., hiring, mentoring, org design)
Company
Kumo
Kumo is an AI model company that provides a platform for anyone to train and run state-of-the-art AI models on their relational data.
H1B Sponsorship
Kumo has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (8)
2024 (3)
2023 (3)
2022 (2)
Funding
Current Stage
Growth StageTotal Funding
$36.5MKey Investors
Sequoia Capital
2022-09-27Series B· $18M
2022-04-07Series A· $18.5M
Recent News
2025-12-02
Company data provided by crunchbase