Senior DevOps and Automation Engineer, Fabric Networking - GPU @ NVIDIA | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Senior DevOps and Automation Engineer, Fabric Networking - GPU jobs in United States
Be an early applicantLess than 25 applicants
company-logo

NVIDIA · 3 hours ago

Senior DevOps and Automation Engineer, Fabric Networking - GPU

ftfMaximize your interview chances
Artificial Intelligence (AI)GPU
check
Growth Opportunities
check
H1B Sponsor Likelynote
Hiring Manager
Bella Yanovsky
linkedin

Insider Connection @NVIDIA

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand
Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations.
Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance.
Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions.
Collaborate effectively with dynamic Engineering and Product Teams across multiple time zones to align cluster operations with evolving project requirements.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

AnsiblePythonShell ScriptingCluster managementLinux fundamentalsComputer networksHigh-performance applicationsResource scheduling managersDGX systemsCompute Clusters

Required

BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience.
5+ years of hands-on experience in deploying and administrating clusters, servers, switches, and related infrastructure.
Automation expert with hands on skills in Ansible, Python and Shell Scripting.
Deep understanding of operating systems, computer networks, and high-performance applications.
Proven ability to work effectively with developers and test engineers across different teams and time zones.
Proficient with Linux fundamentals.

Preferred

Familiarity with resource scheduling managers, preferably Slurm.
Hands-on experience with GPU-focused hardware and software, such as DGX systems and Compute Clusters.
Proficiency in designing and implementing a robust metrics collection and alerting infrastructure.

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (735)
2022 (892)
2021 (696)
2020 (534)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
CEO and Founder
linkedin
leader-logo
Chris Malachowsky
Co-Founder, SVP
linkedin

Recent News

No support or updates for Windows 11 on machines not meeting minimum hardware requirements, says Microsoft | CIO
Company data provided by crunchbase
logo

Orion

Your AI Copilot