Voltage Park · 1 month ago
Infrastructure Engineer (Observability)
Voltage Park is seeking an Infrastructure Engineer with a focus on Observability to join their Infrastructure Engineering team. The role involves designing and operating observability platforms to provide actionable insights for internal teams and external customers, ensuring reliability and transparency at scale.
AI InfrastructureCloud ComputingMachine Learning
Responsibilities
Design, build, and maintain observability platforms spanning metrics, logs, traces, and events
Create dashboards and alerting for internal stakeholders (InfraOps, Engineering, Customer Success) and scoped visibility for external customers
Ingest and correlate telemetry from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish
Implement noise-resistant alerting pipelines that improve detection and reduce operational load
Collaborate with infrastructure, platform, and customer-facing teams to embed observability into workflows
Contribute to broader infrastructure engineering projects beyond observability
Qualification
Required
8+ years in infrastructure engineering, SRE, or observability roles
Strong experience with monitoring systems (Prometheus, Grafana, ELK, VictoriaMetrics, or similar)
Proficiency in Python, Go, or bash for automation and data integration
Familiarity with container/Kubernetes observability
Understanding of streaming telemetry pipelines (Kafka, OTEL, Promtail, or equivalent)
Strong written and verbal communication skills
Preferred
Experience with GPU observability, particularly NVIDIA DCGM
Designing multi-tenant observability solutions with RBAC and scoped queries
Prior work with correlation engines for RCA, forecasting, or predictive alerting
Broader exposure to infrastructure domains (networking, storage, provisioning)
Company
Voltage Park
Voltage Park provides infrastructure for machine learning.
Funding
Current Stage
Growth StageTotal Funding
$500M2026-01-21Acquired
2023-10-30Undisclosed· $500M
Recent News
2026-01-22
2025-10-21
Company data provided by crunchbase