Apply on Employer Site

CoreWeave · 1 day ago

Production Engineer

Bellevue, WA

Full-time

Hybrid

Mid Level

$139K/yr - $204K/yr

4+ years exp

CoreWeave is The Essential Cloud for AI™, providing a platform of technology and tools for innovators. The Production Engineer will be responsible for maintaining the reliability of CoreWeave’s cloud infrastructure, supporting incident response, and contributing to operational improvements.

Artificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning

No H1B

U.S. Citizen Only

Responsibilities

Assist in incident response efforts by helping identify and resolve service disruptions quickly, working under the guidance of more senior engineers

Help document incidents, assist with root cause analysis (RCA), and support post-incident reviews (PIRs) to identify lessons learned

Contribute to the development and maintenance of incident response playbooks to ensure preparedness for various failure scenarios

Participate in communication efforts during incidents, updating stakeholders and keeping clear records of incident activities

Monitor system performance and health using tools like Prometheus and Grafana, identifying any performance issues or potential incidents

Help implement automation and process improvements to enhance efficiency and reduce manual intervention in incident detection and recovery

Support the development of KPIs and SLAs for incident management and ensure alignment with team goals

Collaborate with engineers across teams to improve platform reliability, resilience improvements, and disaster recovery

Work closely with other engineers to troubleshoot system issues, refine workflows, and support ongoing operational needs

Participate in knowledge-sharing activities, helping improve team processes and learning from senior team members

Take part in training and mentorship opportunities to build technical skills and grow into more advanced responsibilities within the team

Qualification

Cloud operationsSite reliability engineeringMonitoring toolsCloud platformsIncident managementScripting toolsCommunicationTeam collaborationAdaptability

Required

4 years of experience in cloud operations, site reliability engineering (SRE), or related technical roles

Understanding of cloud platforms (e.g., Kubernetes, AWS, GCP) and basic knowledge of cloud infrastructure

Familiarity with incident management practices and frameworks (e.g., ITIL, SRE best practices)

Experience with monitoring and alerting tools (e.g., Prometheus, Grafana) or willingness to learn

Basic experience with scripting or automation tools (e.g., Python, Bash, Terraform, Ansible)

Strong communication skills, with the ability to explain technical concepts clearly and concisely to both technical and non-technical team members

Ability to work in a fast-paced, high-pressure environment while learning and adapting quickly

Preferred

Exposure to Kubernetes, containerization, and distributed systems

Familiarity with change management processes and post-incident analysis

Experience with automated systems or self-healing infrastructure is a plus

A desire to learn and grow in the areas of cloud operations, reliability engineering, and incident management

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave

Company-paid Life Insurance

Voluntary supplemental life insurance

Short and long-term disability insurance

Flexible Spending Account

Health Savings Account

Tuition Reimbursement

Ability to Participate in Employee Stock Purchase Program (ESPP)

Mental Wellness Benefits through Spring Health

Family-Forming support provided by Carrot

Paid Parental Leave

Flexible, full-service childcare support with Kinside

401(k) with a generous employer match

Flexible PTO

Catered lunch each day in our office and data center locations

A casual work environment

A work culture focused on innovative disruption

Company

CoreWeave

CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Founded in 2017

Livingston, New Jersey, USA

1001-5000 employees

https://www.coreweave.com

Funding

Current Stage

Public Company

Total Funding

$23.37B

Key Investors

Jane Street CapitalStack CapitalCoatue

2025-12-08Post Ipo Debt· $2.54B

2025-11-12Post Ipo Debt· $1B

2025-08-20Post Ipo Secondary

Leadership Team

Michael Intrator

Chief Executive Officer

Nitin Agrawal

Chief Financial Officer

Recent News

Benzinga.com

Brad Gerstner Bets On This Stock To Benefit From Nvidia's Rubin Platform: 'A Really Interesting Opportunity'

2026-01-08

thefly.com

Mixed options sentiment in CoreWeave Inc with shares up 0.4%

2026-01-08

Crunchbase News

North American Startup Funding Soared 46% In 2025, Driven By AI Boom

2026-01-08

Company data provided by crunchbase