Staff Software Engineer, ML Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cake · 2 weeks ago

Staff Software Engineer, ML Platform

Cake is on a mission to make cutting-edge AI accessible to enterprise teams. As a Staff Software Engineer, you will play a critical leadership role in building and operating the infrastructure that powers Cake’s AI platform, focusing on designing and operating the ML platform foundations for both internal teams and customers.

Artificial Intelligence (AI)Generative AI
check
H1B Sponsor Likelynote

Responsibilities

Build Enterprise-Scale Infrastructure
Leverage infrastructure-as-code to manage complex cloud environments supporting critical ML and AI initiatives
Design Kubernetes-native systems, including controllers/operators where appropriate
Improve platform networking, security, and observability
Sustain Platform Health and Performance
Own critical systems in production, including reliability, scalability, security, and cost efficiency
Identify and proactively address technical debt, operational risk, and platform bottlenecks
“Learn by doing” — Quickly ramp up to a complex tech stack (Terraform, Kubernetes, Istio, Crossplane, Go, TypeScript)
Enable Teams and Customers to Move Faster
Create abstractions and tooling that make it easier for teams and customers to deploy, run, and scale AI/ML workloads
Collaborate directly with customers to understand their ML infrastructure challenges and translate them into platform improvements
Balance speed and rigor—shipping quickly while maintaining a high bar for quality and safety
Lead Through Influence
Act as a technical leader and mentor across the engineering organization
Write clear documentation and design proposals that align stakeholders and drive decisions
Partner closely with product and leadership to shape platform direction and priorities

Qualification

KubernetesInfrastructure-as-codeCloud experienceMLOps platformsProgramming in GoTerraformCI/CD practicesCommunicationCustomer-oriented mindsetCollaborative

Required

10+ years of engineering experience, with significant time spent on infrastructure, platform, or distributed systems
Deep hands-on experience with Kubernetes in production environments
Strong cloud experience across AWS, GCP, and/or Azure
Proven track record of building and operating secure, scalable MLOps platforms
Deep understanding of infrastructure-as-code (e.g., Terraform, Pulumi, CDK)
Strong programming skills in at least one backend language (Go preferred; TypeScript also welcome)
Experience diagnosing and debugging complex production issues
Familiarity with modern CI/CD, test-driven development, and DevSecOps practices
Comfortable owning large, ambiguous problems from inception to production
Excellent communicator, able to clearly explain complex systems to both technical and non-technical audiences
Experience working directly with customers and incorporating feedback into technical decisions
Ability to operate autonomously while keeping stakeholders informed and aligned
Customer-first and product-oriented
Curious, adaptable, and eager to learn new systems and domains
Collaborative, respectful, and willing to lean into hard conversations
Energized by fast-paced environments and meaningful responsibility

Preferred

Bonus: experience building Kubernetes operators and/or working with service meshes (e.g., Istio)

Benefits

Competitive cash compensation alongside above-market equity upside
Top-tier fully covered medical, dental, and vision insurance
Life insurance
401k program
Unlimited PTO
Monthly half day
Citi Bike membership
Monthly wellness stipend
Office equipment stipend, including reimbursement for approved disability-related accommodations
Investment in employee learning and growth opportunities

Company

Cake

twittertwittertwitter
company-logo
Cake provides AI Project Infrastructure, delivering a faster and more cost effective path for businesses to adopt cutting edge AI/ML technologies.

H1B Sponsorship

Cake has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2021 (1)

Funding

Current Stage
Early Stage
Total Funding
$13M
Key Investors
GradientAlumni VenturesPrimary Venture Partners
2024-12-04Seed· $10M
2024-03-28Seed
2022-02-01Pre Seed· $3M
Company data provided by crunchbase