Service Reliability Operations Administrator @ NVIDIA | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Service Reliability Operations Administrator jobs in California, United States
Be an early applicantLess than 25 applicants
company-logo

NVIDIA · 7 hours ago

Service Reliability Operations Administrator

ftfMaximize your interview chances
Artificial Intelligence (AI)GPU
check
Growth Opportunities
check
H1B Sponsor Likelynote

Insider Connection @NVIDIA

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

The team will provide their services 24/7 with a follow-the-sun environment which will span continents.
You will report directly to a manager in the United States.
Each team member will need to work either a Saturday or Sunday each week. The hours worked may include an early or late start (10hrs-per-day x 4 days-per-week schedule) to ensure that the combination the US and India teams provide 24/7 coverage.
The heart of Mission Control will be monitoring and running a growing production compute and storage environments.
Every Mission Control team member will use alerts and alarms to help prevent issues and incidents when possible. You may also work with the developer community to develop and implement predictive support or diagnostic routines.
Perform systems administration tasks, network administration tasks, security incident monitoring to drive our actions.
Mission Control team members will work with developers to learn how the service works, then translate that understanding into runbooks which the entire team will use. As new features and functionality are added, you will also update and evolve the runbooks as needed.
Help discover incidents and issues, including initiating the incident management procedure.
Bring in subject matter authorities or service owners as needed to resolve issues. Feedback will help us continually improve our service.
Your interpersonal skills will help keep the team engaged through resolution and ensure our clients believe we value their time and effort.
May perform other tasks that will help us provide extraordinary service levels for our customers.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Systems AdministrationDevOpsMonitoring ToolsShell ScriptingAutomationDNSDHCPIP TablesPythonVirtual MachinesCloud ServicesApplication ContainersContainer OrchestrationGitAnsible

Required

5+ years of experience administering open system servers in a Production environment.
2+ years of experience working in demanding Internet, Cloud, or Telecommunications environments in a Systems Administration, DevOps, SRE, or NOC role.
B.S. in relevant disciplines or equivalent experience.
Expertise using monitoring tools and problem ticketing systems.
Strong problem-solving, analytical, and troubleshooting abilities.
Strong server administration experience. Shell scripting, automation, DNS, DHCP, storage concepts, basic networking, IP Tables, etc. RHCE or equivalent level of knowledge.
Prior experience running virtual machines under open source or commercial hypervisors.
Experience operating services running on public or private clouds.
Knowledge and understanding of application containers and container orchestration systems.
Basic understanding of Git.
Experience performing system administration tasks using Ansible with prior experience analyzing system and network performance using monitoring alerts, data, and graphs.

Preferred

Experience scripting in Python preferred, but not required.

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (735)
2022 (892)
2021 (696)
2020 (534)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity· undefined

Leadership Team

leader-logo
Jensen Huang
CEO and Founder
linkedin
leader-logo
Chris Malachowsky
Co-Founder, SVP
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot