Apply on Employer Site

Lambda · 4 weeks ago

Forward Deployed Engineer (Site Reliability / Infrastructure)

San Francisco, CA

Full-time

Onsite

Senior Level

$240K/yr - $425K/yr

6+ years exp

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. They are seeking a Forward Deployed Engineer to embed directly with a strategic customer, serving as the technical bridge between Lambda and their team while delivering impactful solutions and optimizing infrastructure for AI/ML workloads.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingGPUMachine Learning

Comp. & Benefits

H1B Sponsor Likely

Responsibilities

Embed on-site with a named strategic customer, becoming an extension of their team

Act as the primary technical liaison between Lambda and the customer organization

Navigate ambiguous requirements to identify root problems and define clear technical solutions

Drive alignment across internal Lambda teams and customer stakeholders

Scope, sequence, and build full-stack solutions that deliver measurable business value

Design and implement infrastructure optimizations for AI/ML workloads at scale

Debug complex distributed systems issues across the infrastructure stack

Ship iteratively and learn fast, adjusting approach based on customer feedback and results

Identify reusable patterns from customer engagements that can scale across Lambda's customer base

Surface field intelligence that influences Lambda's product roadmap

Document and share learnings to elevate the capabilities of the broader team

Represent Lambda with executive presence in high-stakes customer interactions

Qualification

KubernetesGoPythonAI/ML workload managementLinux systemsGitOpsObservability toolsCI/CD pipelinesExecutive presenceBias for actionCommunication skillsTeam collaboration

Required

6+ years of experience in a SRE, software engineer, or similar role, with a deep knowledge of running Linux clusters and systems

Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators

Proven experience operating Kubernetes clusters in production environments (on-prem, EKS, GKE, or similar)

Hands-on experience with AI/ML workload management tools (Volcano, Kubeflow, or similar)

Can work either independently with limited direction or as part of a team

Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines

Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar

Excellent communication skills with the ability to translate technical complexity for diverse audiences

Executive presence and ability to represent Lambda in customer-facing situations

Comfort operating in ambiguous environments with competing priorities

Strong bias for action and shipping iteratively

Preferred

Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator Coding experience

Exposure to HPC clusters, AI/ML workloads, or large-scale GPU clusters

Hybrid or multi-cloud Kubernetes environment experience

Contributions to CNCF projects or Kubernetes SIGs

Benefits

Health, dental, and vision coverage for you and your dependents

Wellness and commuter stipends for select roles

401k Plan with 2% company match (USA employees)

Flexible paid time off plan that we all actually use

Company

Lambda

Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.

Founded in 2012

San Jose, California, USA

501-1000 employees

https://lambda.ai

H1B Sponsorship

Lambda has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (16)

2024 (1)

2023 (3)

2022 (2)

2021 (2)

2020 (3)

Funding

Current Stage

Late Stage

Total Funding

$3.19B

Key Investors

TWG GlobalJP MorganMacquarie Group

2025-11-18Series E· $1.5B

2025-08-19Debt Financing· $275M

2025-02-19Series D· $480M

Leadership Team

Stephen Balaban

Co-founder, CEO

Michael Balaban

Co-Founder / CTO

Recent News

Crunchbase News

North American Startup Funding Soared 46% In 2025, Driven By AI Boom

2026-01-08

PitchBook

10 of the biggest winners from 2025’s AI boom

2025-12-25

Crowdfund Insider

AI Adoption Trends : 2025 Saw Emergence of 1000+ Agentic AI Offerings

2025-12-22

Company data provided by crunchbase