Senior Storage Production Engineer - DGX Cloud jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 4 hours ago

Senior Storage Production Engineer - DGX Cloud

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The Senior Storage Production Engineer will design, implement, and maintain large-scale storage clusters, ensuring high availability and data integrity while optimizing storage performance for AI/ML workloads.

Artificial Intelligence (AI)Consumer ElectronicsGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design, implement, and support large-scale storage clusters, ensuring scalability, high availability, and data integrity
Develop and maintain storage monitoring, logging, and alerting systems to ensure proactive detection and resolution of performance issues
Work with AI/ML workloads to optimize storage architectures for low-latency access, efficient caching, and high-throughput performance
Improve the lifecycle of storage services – from inception and design to deployment, operation, and continuous optimization. Support storage services before they go live through activities such as system design consulting, developing automation frameworks, capacity management, and launch reviews
Maintain production storage infrastructure by monitoring availability, latency, and system health, leveraging predictive analytics and AI-driven automation
Optimize storage efficiency through compression, deduplication, tiering strategies, and intelligent workload placement
Scale storage systems sustainably using AI/ML-driven automation, policy-based tiering, and dynamic data migration techniques. Ensure data security and compliance by implementing encryption, access controls, and auditing mechanisms for storage systems
Practice sustainable incident response and blameless root cause analysis. Be part of an on-call rotation to support storage and production systems

Qualification

Distributed storage solutionsStorage networking protocolsStorage automationPerformance tuningLinux-based systemsInfrastructure configuration managementObservability toolsWork ethicsTeamworkCommunication skills

Required

BS degree or equivalent experience in Computer Science, Storage Systems, or a related technical field with 8+ years of practical experience
Experience with distributed and high-performance storage solutions, including clustered and parallel file systems, distributed object storage, and enterprise-grade storage systems
Solid understanding of block, file, and object storage technologies, including their scalability, reliability, and performance characteristics and standard processes
Experience with storage networking protocols such as NFS, SMB, iSCSI, S3, Fibre Channel, RDMA, and NVMe over Fabrics
Expertise in algorithms, data structures, complexity analysis, software design, and automating maintenance of large-scale Linux-based storage systems
Experience in one or more of the following: C/C++, Java, Python, Go, NodeJS, and Bash for storage automation, monitoring, and performance tuning
Hands-on experience with infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform for automating storage deployments
Experience with observability and tracing tools like InfluxDB, Prometheus, Grafana, and the Elastic stack for monitoring storage system health
Excellent written and oral communication skills
Excellent work ethics
A deep sense of teamwork
Love to produce quality work and commitment to finishing your tasks every single day

Preferred

Deep understanding of large-scale distributed storage architectures, replication strategies, and erasure coding techniques
Experience in capacity planning, performance tuning, and troubleshooting high-throughput storage systems
Experience with Git, code review, pipelines, and CI/CD for handling infrastructure as code
Experience in analyzing and improving distributed storage system performance at scale
Strong debugging skills with a systematic problem-solving approach to identify complex storage issues
Proven understanding of network protocols, architectures, and troubleshooting techniques, especially as it relates to storage performance, stability, and availability
Experience using or operating private and public cloud storage solutions based on Kubernetes, OpenStack, or hybrid cloud architectures
Ability to design and implement automated storage migration, backup, and disaster recovery strategies
Thrive in collaborative environments and enjoy working with various teams to optimize storage performance
Flexible in adapting to different working styles and emerging storage technologies

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase