Infrastructure Engineer (Observability) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Voltage Park · 1 month ago

Infrastructure Engineer (Observability)

Voltage Park is seeking an Infrastructure Engineer with a focus on Observability to join their Infrastructure Engineering team. The role involves designing and operating observability platforms to provide actionable insights for internal teams and external customers, ensuring reliability and transparency at scale.

AI InfrastructureCloud ComputingMachine Learning
badNo H1Bnote

Responsibilities

Design, build, and maintain observability platforms spanning metrics, logs, traces, and events
Create dashboards and alerting for internal stakeholders (InfraOps, Engineering, Customer Success) and scoped visibility for external customers
Ingest and correlate telemetry from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish
Implement noise-resistant alerting pipelines that improve detection and reduce operational load
Collaborate with infrastructure, platform, and customer-facing teams to embed observability into workflows
Contribute to broader infrastructure engineering projects beyond observability

Qualification

Infrastructure engineeringMonitoring systemsObservabilityPythonGoBashContainer observabilityStreaming telemetry pipelinesCommunication skills

Required

8+ years in infrastructure engineering, SRE, or observability roles
Strong experience with monitoring systems (Prometheus, Grafana, ELK, VictoriaMetrics, or similar)
Proficiency in Python, Go, or bash for automation and data integration
Familiarity with container/Kubernetes observability
Understanding of streaming telemetry pipelines (Kafka, OTEL, Promtail, or equivalent)
Strong written and verbal communication skills

Preferred

Experience with GPU observability, particularly NVIDIA DCGM
Designing multi-tenant observability solutions with RBAC and scoped queries
Prior work with correlation engines for RCA, forecasting, or predictive alerting
Broader exposure to infrastructure domains (networking, storage, provisioning)

Company

Voltage Park

twittertwitter
company-logo
Voltage Park provides infrastructure for machine learning.

Funding

Current Stage
Growth Stage
Total Funding
$500M
2026-01-21Acquired
2023-10-30Undisclosed· $500M

Leadership Team

leader-logo
Eric Park
Chief Executive Officer
linkedin
leader-logo
Mike Xia
Chief Product Officer
linkedin
Company data provided by crunchbase