Senior Site Reliability Engineer, Omniverse Cloud Platform @ NVIDIA | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Senior Site Reliability Engineer, Omniverse Cloud Platform jobs in New Jersey, United States
78 applicants
company-logo

NVIDIA · 16 hours ago

Senior Site Reliability Engineer, Omniverse Cloud Platform

ftfMaximize your interview chances
Artificial Intelligence (AI)GPU
check
Growth Opportunities
check
H1B Sponsor Likelynote

Insider Connection @NVIDIA

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Own, innovate, and build programs, new software, and analytics that drive improvements to the availability, scalability, latency, and efficiency of Omniverse products and services
Handle upgrades, and automated rollbacks across all clusters
Maintain Service Level Agreement (SLAs) of measurable benchmarks, working hand in hand with developers of new services on how to define SLIs, and design a stable, secure service
Help guide the Change Advisory Board, and RCCA processes
Work with product area leads from technologies across NVIDIA to guide product engineering to build fast, reliable, and durable production systems
Apply standard methodologies and first principled thinking to Omniverse and other strategic Cloud offerings from NVIDIA.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

System DesignUnix/Linux SystemsC++PythonKubernetesIncident ManagementLarge Scale CoordinationHPCPaaSSaaSMonitoring StacksOpen TelemetryObservability StacksMachine LearningModel Training

Required

Bachelor's degree in Computer Science or a related field, or equivalent experience
8+ years of demonstrated competency in system design, complexity analysis, software design in Unix/Linux systems, performance, and application issues
8+ years' of validated experience authoring, and debugging software written in C++ and Python
Deep hands-on experience with Kubernetes based cloud environments
Proven experience in incident management and large scale incident coordination.
Experience working with partners across multiple teams
Background with HPC or Model Training Operations or related experience.

Preferred

Multiple CSP expertise.
Experience with Monitoring stacks, Open Telemetry and sophisticated Observability stacks
Background with PaaS, and SaaS offerings
Experience in Highly available and large scale environment support and reliability.
Experience in Machine Learning and Model Training

Benefits

Equity and benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (735)
2022 (892)
2021 (696)
2020 (534)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
CEO and Founder
linkedin
leader-logo
Chris Malachowsky
Co-Founder, SVP
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot