ideaHelix · 11 hours ago
Data Center Operations Engineer - Onsite- Santa Fe, New Mexico
ideaHelix is seeking a Data Center Operations Engineer responsible for supporting, maintaining, and deploying critical data center infrastructure. The role requires hands-on expertise in Linux-based systems and GPU server deployments, focusing on ensuring reliable and scalable service delivery.
ConsultingInformation TechnologyProduct Design
Responsibilities
Provide hands-on operational support for all data center projects, deployments, and repair activities
Participate in an on-call rotation and provide on-site or remote support during maintenance windows and incidents
Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, networking, and storage infrastructure
Support customer and internal deployments, ensuring timely and successful bring-up of GPU servers and clusters
Perform InfiniBand fabric bring-up, switch configuration, subnet management, and troubleshooting
Conduct daily health checks of Linux systems and infrastructure components, proactively identifying and mitigating risks
Install, configure, test, and maintain server hardware (rack and stack, labeling, HDDs, memory, CPUs, RAID batteries, NICs, etc.)
Install, configure, and troubleshoot networking equipment including routers, switches, and terminal servers for out-of-band management
Review and validate equipment deployments against approved design documentation and standards
Support data center builds, refreshes, migrations, and expansions while adhering to quality and safety standards
Coordinate with vendors and onsite staff for hardware delivery, diagnostics, replacement, and warranty services
Utilize monitoring and alerting frameworks to identify issues, escalate appropriately, and ensure timely service restoration
Maintain accurate documentation of operational procedures, system configurations, and runbooks
Follow established incident management, escalation procedures, and service-level agreements (SLAs)
Collaborate with global teams across time zones to support operational initiatives and continuous improvement efforts
Contribute to process improvement initiatives and ensure adherence to documented policies, processes, and procedures
Qualification
Required
Bachelor's degree in Computer Science, Engineering, Information Technology, or equivalent practical experience
Strong hands-on experience in Linux environments, including system administration, troubleshooting, and performance validation
Proficiency with Linux command-line tools and shell scripting (Bash or equivalent)
Experience with cluster bring-up, driver installation, and system-level configuration
Hands-on experience setting up and validating GPU servers in clustered environments
Experience with end-to-end GPU testing in InfiniBand-based clusters
Working knowledge of InfiniBand networking, including switch configuration and subnet management
Solid understanding of networking fundamentals, including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP)
Experience installing, configuring, and troubleshooting routers, switches, and terminal servers
Familiarity with fiber and copper cabling, including IP and SAN deployments
Experience managing incident tickets, maintaining acceptable ticket loads, and meeting SLAs
Strong organizational skills with meticulous attention to detail in data center environments
Ability to follow and enforce documented escalation procedures and operational policies
Strong verbal and written communication skills, with the ability to collaborate effectively with cross-functional and global teams
Preferred
Experience supporting HPC, AI, or large-scale GPU environments
Exposure to data center monitoring
Experience documenting operational processes and maintaining technical runbooks
Familiarity with large-scale data center buildouts or refresh programs
Company
ideaHelix
IdeaHelix specializes in product development, design, testing and strategic consulting services.
H1B Sponsorship
ideaHelix has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (11)
2024 (9)
2023 (11)
2022 (8)
2021 (16)
2020 (21)
Funding
Current Stage
Growth StageCompany data provided by crunchbase