Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems @ NVIDIA | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems jobs in Oregon, United States
57 applicants
company-logo

NVIDIA · 12 hours ago

Senior DGX Cloud Software Engineer- Infrastructure Automation and Distributed Systems

ftfMaximize your interview chances
Artificial Intelligence (AI)GPU
check
Growth Opportunities
check
H1B Sponsor Likelynote
Hiring Manager
Bella Yanovsky
linkedin

Insider Connection @NVIDIA

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Design, build, and run cloud infrastructure services in scope to meet our business goals performing integrations, migrations, bringups, updates, and decommissions as necessary.
Participate in the definition of our internal facing service level objectives and error budgets as part of our overall observability strategy.
Eliminate toil or automate it where the ROI of building and maintaining automation is worth it.
Practice sustainable blameless incident prevention and incident response while being a member of an oncall rotation.
Consult with and provide consultation for peer teams on systems design best practices.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Cloud infrastructure servicesInfrastructure automationDistributed systems designPythonLinuxGoPerlRubyStorageContainersKubernetesOpenStackDockerSlurmNvidia Collective Communication LibrarySense of ownership

Required

BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
5+ years of relevant experience.
A track record showing a good balance between initiating your own projects, convincing others to collaborate with you, and collaborating well on projects initiated by others.
Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems in production.
Experience in one or more of the following: Python, Go or C++.
In depth knowledge in one or more of the following: Linux, Slurm, Kubernetes, Networking, Storage, and Containers.

Preferred

Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Experience working with or developing bare metal as a service (BMaaS) associated systems. For example, vending BMaaS, or Slurm running on containers, or vending Kubernetes clusters.
Experience working with or developing multi-cloud infrastructure services.
Experience teaching reliability (e.g. SRE) or more general cloud systems good practices to peers or to other companies (e.g. CRE).
Experience in running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker or Slurm.
Experience with NVIDIA Collective Communication Library (NCCL).
Experience working with a centralized security organization to prioritize and mitigate security risks.
Experience balancing build vs reuse vs buy.
No prior experience having worked in a team of any particular name or having worked in a ML/AI focused team are required but also a nice to have.

Benefits

Equity and benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (735)
2022 (892)
2021 (696)
2020 (534)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity· undefined

Leadership Team

leader-logo
Jensen Huang
CEO and Founder
linkedin
leader-logo
Chris Malachowsky
Co-Founder, SVP
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot