Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 1 day ago

Staff Site Reliability Engineer

NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. They are seeking a Staff Site Reliability Engineer to lead technical strategies for large-scale SRE initiatives, focusing on improving reliability and developer productivity across enterprise systems.

Artificial Intelligence (AI)Consumer ElectronicsGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Lead the technical strategy and roadmap for large-scale, cross-functional SRE initiatives that improve reliability, scalability, and developer productivity across enterprise systems
Design, and build resilient distributed systems that power NVIDIA’s next-generation AI-driven enterprise products and services
Drive automation and observability improvements, using metrics and analytics to enhance performance, reliability, and efficiency
Collaborate across Cloud, Platform, Security, and AI/ML teams to implement modern SRE components that ensure high availability and secure operations
Analyze and troubleshoot complex systems, championing best practices in system design, incident management, and postmortem analysis
Mentor and influence engineers across teams, fostering technical excellence and a culture of reliability engineering

Qualification

Site Reliability EngineeringInfrastructure-as-codeKubernetesCloud servicesProgramming languagesSystems architectureObservabilityProblem-solvingCommunicationTeamwork

Required

10+ years of experience in Site Reliability Engineering, Platform Engineering, or Cloud Architect roles
BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience
Strong proficiency in programming languages such as Python, Typescript, JavaScript, or Go, with a focus on automation and infrastructure-as-code
Experience with infrastructure-as-code such as AWS CDK, AWS CloudFormation, Terraform or CrossPlane
Solid understanding of OpenTelemetry or other Observability implementation at scale
Deep expertise in systems architecture, networking, Kubernetes, and public cloud services (AWS, Azure, or GCP)
Outstanding problem-solving, communication, and teamwork skills, with the ability to influence across technical and interpersonal boundaries

Preferred

Passion for and experience with Public Cloud or large-scale automation systems
Demonstrated ability to drive technical strategy and deliver measurable reliability outcomes in complex environments
A strong sense of ownership, curiosity, and innovation—you thrive in ambiguity and turn challenges into opportunities

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase