NVIDIA · 1 day ago
Staff Site Reliability Engineer
NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. They are seeking a Staff Site Reliability Engineer to lead technical strategies for large-scale SRE initiatives, focusing on improving reliability and developer productivity across enterprise systems.
Artificial Intelligence (AI)Consumer ElectronicsGPUHardwareSoftwareVirtual Reality
Responsibilities
Lead the technical strategy and roadmap for large-scale, cross-functional SRE initiatives that improve reliability, scalability, and developer productivity across enterprise systems
Design, and build resilient distributed systems that power NVIDIA’s next-generation AI-driven enterprise products and services
Drive automation and observability improvements, using metrics and analytics to enhance performance, reliability, and efficiency
Collaborate across Cloud, Platform, Security, and AI/ML teams to implement modern SRE components that ensure high availability and secure operations
Analyze and troubleshoot complex systems, championing best practices in system design, incident management, and postmortem analysis
Mentor and influence engineers across teams, fostering technical excellence and a culture of reliability engineering
Qualification
Required
10+ years of experience in Site Reliability Engineering, Platform Engineering, or Cloud Architect roles
BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience
Strong proficiency in programming languages such as Python, Typescript, JavaScript, or Go, with a focus on automation and infrastructure-as-code
Experience with infrastructure-as-code such as AWS CDK, AWS CloudFormation, Terraform or CrossPlane
Solid understanding of OpenTelemetry or other Observability implementation at scale
Deep expertise in systems architecture, networking, Kubernetes, and public cloud services (AWS, Azure, or GCP)
Outstanding problem-solving, communication, and teamwork skills, with the ability to influence across technical and interpersonal boundaries
Preferred
Passion for and experience with Public Cloud or large-scale automation systems
Demonstrated ability to drive technical strategy and deliver measurable reliability outcomes in complex environments
A strong sense of ownership, curiosity, and innovation—you thrive in ambiguity and turn challenges into opportunities
Benefits
Equity
Benefits
Company
NVIDIA
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.
H1B Sponsorship
NVIDIA has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)
Funding
Current Stage
Public CompanyTotal Funding
$4.09BKey Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity
Recent News
2026-01-06
2026-01-06
Company data provided by crunchbase