Apply on Employer Site

NVIDIA · 4 hours ago

Site Reliability Engineer, HPC and LSF

Santa Clara, CA

Full-time

Onsite

Entry, Mid Level

$124K/yr - $196K/yr

2+ years exp

NVIDIA has been a leader in computer graphics and accelerated computing for over 25 years, now focusing on AI to shape the future of computing. As a Site Reliability Engineer, you will lead the design and implementation of high-performance compute clusters, ensuring their reliability and efficiency while improving engineering productivity through automation.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Troubleshoot incoming support requests in a large-scale HPC environment

Contribute enhancements to existing deployment automation, configuration management, observability, and operational monitoring and day to day operation through automation

Ensure compute servers are running correct Operating System and configuration

Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency

Collaborate with specialist teams to drive issues to closure

Collaborate with domain experts to improve how our chip development process utilizes our infrastructure

Directly contribute to the overall quality and improve time to market for our next generation chips

Qualification

Centos/RHEL LinuxContainer technologiesPythonCluster configuration managementUNIX scriptingJob scheduler administrationFlexLM license managementHigh-Speed NetworkingPerlProblem-solvingCommunication skills

Required

Proficient in administering Centos/RHEL Linux distributions

Understanding of container technologies like Docker

Proficiency in Python and UNIX scripting languages such as bash

Excellent problem-solving skills, with the ability to analyze complex systems, identify bottlenecks, and implement scalable solutions

Excellent communication and teamwork skills, with the ability to work effectively with diverse teams and individuals

BS in Computer Science, similar degree (or equivalent experience) with 2+yrs of relevant post degree experience

Solid understanding of cluster configuration managements tools such as Ansible

Preferred

Understanding of key Linux technologies such as NFS, automounter, LDAP, DNS, and TCP/IP networking in Red Hat Linux distribution flavors

Familiarity with job scheduler administration (e.g. IBM Spectrum LSF or SLURM) and experience building/ operating large scale compute infrastructure

Knowledge of the FlexLM license management system

Proficiency in Perl for maintaining legacy automation scripts

Familiarity with High-Speed Networking (InfiniBand, RDMA, RoCE etc.) and fast, distributed storage systems (Lustre, GPFS, etc.)

Benefits

Equity

Benefits

Company

NVIDIA

Glassdoor4.6

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

Founded in 1993

Santa Clara, California, USA

10001+ employees

https://www.nvidia.com

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1877)

2024 (1355)

2023 (976)

2022 (835)

2021 (601)

2020 (529)

Funding

Current Stage

Public Company

Total Funding

$4.09B

Key Investors

ARPA-EARK Investment ManagementSoftBank Vision Fund

2023-05-09Grant· $5M

2022-08-09Post Ipo Equity· $65M

2021-02-18Post Ipo Equity

Leadership Team

Jensen Huang

Founder and CEO

Michael Kagan

Chief Technology Officer

Recent News

The Motley Fool

Is Archer Aviation's Deal With Nvidia a Game Changer?

2026-01-12

PR Newswire UK

Supermicro Announces Intelligent In-Store Retail Solutions in Collaboration with a Broad Range of Industry Partners

2026-01-12

The Motley Fool

Should You Forget Nvidia and Buy These 2 Artificial Intelligence (AI) Stocks Right Now?

2026-01-12

Company data provided by crunchbase