Senior Production Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

CoreWeave · 1 day ago

Senior Production Engineer

CoreWeave is The Essential Cloud for AI™, delivering a platform of technology and tools for innovators. The Senior Production Engineer will take ownership of critical tooling to ensure reliability and operational excellence in CoreWeave's cloud services, leading projects and collaborating with service owners.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution
Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery
Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support
Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes
Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness
Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns
Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations
Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate
Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling

Qualification

Distributed systemsCloud platformsKubernetesPythonGoObservability stacksAutomationDebuggingIncident responseCollaboration

Required

7+ years of engineering experience building and operating distributed systems or cloud platforms
Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation
Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools
Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes
Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices
A track record of successfully delivering hands-on reliability improvements through engineering execution

Preferred

Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations
Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering
Background operating or building large-scale AI or GPU-accelerated infrastructure
Experience maintaining multi-year ownership of foundational production systems

Benefits

100% employer-paid medical, dental, and vision coverage
Life, short- and long-term disability insurance
401(k) with generous employer match
Flexible PTO and childcare support through Kinside
Catered lunch daily (for office-based employees), weekly massages (NY/NJ)
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption

Company

CoreWeave

twittertwittertwitter
company-logo
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Funding

Current Stage
Public Company
Total Funding
$23.37B
Key Investors
Jane Street CapitalStack CapitalCoatue
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $1B
2025-08-20Post Ipo Secondary

Leadership Team

leader-logo
Michael Intrator
Chief Executive Officer
linkedin
leader-logo
Nitin Agrawal
Chief Financial Officer
linkedin
Company data provided by crunchbase