SIGN IN
Software Engineer, Observability jobs in United States
cer-icon
Apply on Employer Site
company-logo

CoreWeave · 7 hours ago

Software Engineer, Observability

CoreWeave is The Essential Cloud for AI™, delivering a platform that enables innovators to build and scale AI with confidence. The Software Engineer in Observability will be responsible for building, maintaining, and optimizing systems that support GPU-dense clusters and telemetry, enhancing the observability stack for AI workloads.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Design, build, and maintain logging, tracing, and/or metrics platforms by writing production-quality code in languages like Go and Python, with guidance from senior engineers, contributing to the reliability and performance of our observability stack
Develop and refine monitoring and alerting to enhance system reliability, reduce meantime-to-detect, and improve incident response
Assist engineers across CoreWeave in developing effective usage patterns for observability systems, helping teams instrument services, tune dashboards, and set actionable alerts
Manage production and pre-production clusters, including deployments and configuration, and build tools that enable development teams to follow best practices
Participate in the team’s on-call rotation to support critical production systems, learning from incidents and contributing to long-term reliability improvements

Qualification

GoPythonKubernetesObservability systemsContainerizationMicroservices architecturesTerraformAnalytical skillsProblem-solving skillsCommunication skills

Required

2+ years of experience in Software Engineering, Site Reliability Engineering, DevOps, or a related field
Proficiency in at least one programming or scripting language (e.g., Python, Go)
Experience working with Kubernetes, containerization, and microservices architectures
Experience participating in on-call rotations, including triaging and appropriately escalating production issues
Experience using observability systems at scale (e.g., metrics, logging, tracing) to understand and debug complex distributed systems
Strong problem-solving, analytical, and communication skills, with the ability to work effectively with other engineering teams

Preferred

Experience running a production observability database or tool (e.g., ClickHouse, Elastic, Loki, VictoriaMetrics, Prometheus, Thanos, OpenTelemetry, Grafana)
Familiarity with infrastructure-as-code tools like Terraform
Exposure to modern testing frameworks and progressive deployment strategies (e.g., canary, blue–green)
Hands-on experience using data-streaming systems (e.g., Kafka, Kafka Connect) for observability pipelines
Experience with modern AI platforms and workloads (e.g., large-scale training and inference, GPU-based infrastructure, MLOps tooling) is a plus

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption

Company

CoreWeave

twittertwittertwitter
company-logo
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Funding

Current Stage
Public Company
Total Funding
$26.87B
Key Investors
NVIDIAGoldman Sachs,JP Morgan Chase,Morgan Stanley,MUFG Union BankJane Street Capital
2026-01-26Post Ipo Equity· $2B
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $2.5B

Leadership Team

leader-logo
Michael Intrator
Chief Executive Officer
linkedin
leader-logo
Brannin McBee
Founder & CDO
linkedin
Company data provided by crunchbase