NVIDIA · 1 day ago
Engineering Manager, Observability Platform
NVIDIA is a leader in groundbreaking developments in Artificial Intelligence and High-Performance Computing. They are seeking an Engineering Manager to lead the team responsible for building and operating NVIDIA’s global observability platform, focusing on metrics, logs, traces, and events crucial for debugging services.
AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
Responsibilities
Leading a team of engineers who design and build the core services, pipelines, and storage layers behind NVIDIA’s observability platform
Creating a clear technical direction for the team and supporting work that emphasizes simplicity, performance, and maintainability
Defining the architecture for distributed ingestion services, time-series storage, log and trace pipelines, query paths, and multi-region data flows
Partnering with platform, infrastructure, and application teams to define data models, instrumentation patterns, APIs, and integration standards
Strengthening engineering practices through better tooling, automated tests, schema management, API versioning, documentation, and safe rollout processes
Helping engineers solve distributed-systems issues including ingestion load, indexing pressure, compaction behavior, query fan-out, and replication patterns
Driving predictable execution through clear priorities, collaborative planning, and strong alignment across teams
Representing the observability platform across NVIDIA, gathering feedback, and evolving the system to support future AI workloads
Qualification
Required
Bachelors or Master's degree in Computer Science or a related technical field (or equivalent experience)
8+ overall years building distributed systems, with a focus on observability and monitoring systems, and 3+ years managing or leading engineers
Experience with modern observability stacks such as Prometheus, Thanos, Mimir, Loki, OpenSearch, Jaeger, Tempo, or OpenTelemetry or equivalent experience
Strong foundations in distributed systems concepts including replication, sharding, durability, consensus, and performance tuning
Hands-on experience designing or scaling ingestion pipelines, time-series engines, trace backends, or log indexing systems, especially in high-cardinality environments
Ability to read and review Go or Python code and support engineers through technical decision-making
Clear architectural thinking with a focus on stable APIs, predictable performance, and long-term evolution
Experience mentoring engineers, improving technical judgment, and contributing to a healthy and inclusive engineering culture
Strong communication skills and the ability to explain complex challenges with clarity
Preferred
Experience building or contributing to an observability or telemetry platform used at significant scale
Contributions to open-source projects such as OpenTelemetry, Prometheus, Loki, Thanos, Tempo, Jaeger, ClickHouse, Mimir, or Elasticsearch
Experience with high-throughput systems like Kafka, Flink, Spark, or large-scale data collectors
Deep knowledge of cardinality management, query performance, storage design, or retention optimization
Experience designing multi-region architectures with a focus on consistency, availability, and data locality
Benefits
Equity
Benefits
Company
NVIDIA
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.
H1B Sponsorship
NVIDIA has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)
Funding
Current Stage
Public CompanyTotal Funding
$4.09BKey Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity
Recent News
Business Insider
2026-01-09
Business Insider
2026-01-09
Company data provided by crunchbase