Sustainable Talent · 1 day ago
Senior Infrastructure Engineer
Sustainable Talent is supporting NVIDIA as a Senior Infrastructure Engineer in their IPP Cloud Infrastructure Team. The role involves managing and optimizing operations within cloud environments, leading the deployment and configuration of infrastructures, and collaborating with multi-functional teams to enhance deployment processes.
ConsultingHuman ResourcesInformation Technology
Responsibilities
Collaborate with the Infrastructure Team to manage and optimize operations within our Infrastructure and Cloud environments, with a strong focus on large-scale system configurations and automation
Lead the deployment, configuration, and troubleshooting of data center and cloud-based infrastructures, ensuring efficient operations for NVIDIA's latest hardware and technologies
Design and implement automated solutions for product onboarding into our hosted and private cloud environments, utilizing robust scripting techniques
Work closely with engineers, architects, and product managers to strategize and execute product launches, enhancing deployment processes
Tackle complex challenges related to multi-site deployments of NVIDIA products, applying innovative problem-solving skills
Partner with multi-functional teams, including system engineering, software engineering, and operations, to deliver reliable and scalable platforms from concept to production
Focus on managing systems at scale, writing code for simultaneous configuration of multiple servers, and improving deployment efficiency, including API integrations for automation
Qualification
Required
Bachelor's or Master's Degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience
5+ years of relevant experience, with a strong emphasis on DevOps practices
3+ years of experience with Linux systems and scripting (Bash, Python)
Solid background in managing large-scale infrastructure operations with an emphasis on automation and configuration management
Proven ability to quickly adapt to and implement new technologies, including system-level operations and tools
Strong understanding of embedded systems, orchestration, data centers, and cloud architecture, along with excellent communication and planning skills
Experience in product engineering, debugging, and hardware configuration, with a focus on system-level operations
Proven experience in configuring systems at scale, focusing on automation and efficiency
Familiarity with tools for managing remote server configurations, including BMC/IPMI systems
Preferred
Experience in large-scale QA environments and product bring-ups
Familiarity with operations support, bug tracking, and ticket management
Background in supporting GPUs, embedded device development, and CUDA applications
Knowledge of converged and hyper-converged infrastructure
Experience with configuration management tools (e.g., Puppet, Chef) for hardware setups
Strong expertise in system configuration protocols (e.g., IPMI/BMC, Redfish)
Knowledge of CI/CD tools like Jenkins for automating deployment pipelines
Experience working with APIs for system communication and automation
Strong hardware knowledge, particularly in configuring hardware components (e.g., BIOS, CPU) in large-scale environments
Experience configuring BIOS settings remotely in large hardware deployments
Ideal candidates may have experience from companies like Dell, IBM, or HP, or in organizations that produce servers or operate on-premise cloud solutions
Benefits
Full benefits
PTO
Amazing company culture
Company
Sustainable Talent
Sustainable Talent provides staffing, consulting and outsourcing services.