NVIDIA · 3 hours ago
Senior DevOps and Automation Engineer, Fabric Networking - GPU
Maximize your interview chances
Insider Connection @NVIDIA
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand
Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations.
Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance.
Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions.
Collaborate effectively with dynamic Engineering and Product Teams across multiple time zones to align cluster operations with evolving project requirements.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience.
5+ years of hands-on experience in deploying and administrating clusters, servers, switches, and related infrastructure.
Automation expert with hands on skills in Ansible, Python and Shell Scripting.
Deep understanding of operating systems, computer networks, and high-performance applications.
Proven ability to work effectively with developers and test engineers across different teams and time zones.
Proficient with Linux fundamentals.
Preferred
Familiarity with resource scheduling managers, preferably Slurm.
Hands-on experience with GPU-focused hardware and software, such as DGX systems and Compute Clusters.
Proficiency in designing and implementing a robust metrics collection and alerting infrastructure.
Benefits
Equity
Benefits
Company
NVIDIA
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.
H1B Sponsorship
NVIDIA has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (735)
2022 (892)
2021 (696)
2020 (534)
Funding
Current Stage
Public CompanyTotal Funding
$4.09BKey Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity
Recent News
Mexico Business
2024-12-13
No support or updates for Windows 11 on machines not meeting minimum hardware requirements, says Microsoft | CIO
2024-12-13
vcnewsdaily.com
2024-12-13
Company data provided by crunchbase