Apply on Employer Site

Sustainable Talent · 1 day ago

Senior Infrastructure Engineer

Santa Clara, CA

Full-time

Onsite

Senior Level

$80/hr - $100/hr

5+ years exp

Sustainable Talent is supporting NVIDIA as a Senior Infrastructure Engineer in their IPP Cloud Infrastructure Team. The role involves managing and optimizing operations within cloud environments, leading the deployment and configuration of infrastructures, and collaborating with multi-functional teams to enhance deployment processes.

ConsultingHuman ResourcesInformation Technology

Growth Opportunities

Responsibilities

Collaborate with the Infrastructure Team to manage and optimize operations within our Infrastructure and Cloud environments, with a strong focus on large-scale system configurations and automation

Lead the deployment, configuration, and troubleshooting of data center and cloud-based infrastructures, ensuring efficient operations for NVIDIA's latest hardware and technologies

Design and implement automated solutions for product onboarding into our hosted and private cloud environments, utilizing robust scripting techniques

Work closely with engineers, architects, and product managers to strategize and execute product launches, enhancing deployment processes

Tackle complex challenges related to multi-site deployments of NVIDIA products, applying innovative problem-solving skills

Partner with multi-functional teams, including system engineering, software engineering, and operations, to deliver reliable and scalable platforms from concept to production

Focus on managing systems at scale, writing code for simultaneous configuration of multiple servers, and improving deployment efficiency, including API integrations for automation

Qualification

DevOps practicesLinux systemsAutomationConfiguration managementCloud architectureScripting BashScripting PythonPlanning skillsHardware configurationCommunication skillsProblem-solving skills

Required

Bachelor's or Master's Degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience

5+ years of relevant experience, with a strong emphasis on DevOps practices

3+ years of experience with Linux systems and scripting (Bash, Python)

Solid background in managing large-scale infrastructure operations with an emphasis on automation and configuration management

Proven ability to quickly adapt to and implement new technologies, including system-level operations and tools

Strong understanding of embedded systems, orchestration, data centers, and cloud architecture, along with excellent communication and planning skills

Experience in product engineering, debugging, and hardware configuration, with a focus on system-level operations

Proven experience in configuring systems at scale, focusing on automation and efficiency

Familiarity with tools for managing remote server configurations, including BMC/IPMI systems

Preferred

Experience in large-scale QA environments and product bring-ups

Familiarity with operations support, bug tracking, and ticket management

Background in supporting GPUs, embedded device development, and CUDA applications

Knowledge of converged and hyper-converged infrastructure

Experience with configuration management tools (e.g., Puppet, Chef) for hardware setups

Strong expertise in system configuration protocols (e.g., IPMI/BMC, Redfish)

Knowledge of CI/CD tools like Jenkins for automating deployment pipelines

Experience working with APIs for system communication and automation

Strong hardware knowledge, particularly in configuring hardware components (e.g., BIOS, CPU) in large-scale environments

Experience configuring BIOS settings remotely in large hardware deployments

Ideal candidates may have experience from companies like Dell, IBM, or HP, or in organizations that produce servers or operate on-premise cloud solutions