Apply on Employer Site

Cadence · 1 day ago

Data Center Operations Engineer

Santa Fe, NM

Full-time

Onsite

Mid Level

Cadence is a company that hires and develops leaders and innovators in technology. The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure, focusing on Linux-based systems and GPU server deployments while collaborating with global teams to ensure reliable service delivery.

AerospaceElectronic Design Automation (EDA)HardwareMobileSemiconductorSoftware

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Provide hands-on operational support for all data center projects, deployments, and repair activities

Participate in an on-call rotation and provide on-site or remote support during maintenance windows and incidents

Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, networking, and storage infrastructure

Support customer and internal deployments, ensuring timely and successful bring-up of GPU servers and clusters

Perform InfiniBand fabric bring-up, switch configuration, subnet management, and troubleshooting

Conduct daily health checks of Linux systems and infrastructure components, proactively identifying and mitigating risks

Install, configure, test, and maintain server hardware (rack and stack, labeling, HDDs, memory, CPUs, RAID batteries, NICs, etc.)

Install, configure, and troubleshoot networking equipment including routers, switches, and terminal servers for out-of-band management

Review and validate equipment deployments against approved design documentation and standards

Support data center builds, refreshes, migrations, and expansions while adhering to quality and safety standards

Coordinate with vendors and onsite staff for hardware delivery, diagnostics, replacement, and warranty services

Utilize monitoring and alerting frameworks to identify issues, escalate appropriately, and ensure timely service restoration

Maintain accurate documentation of operational procedures, system configurations, and runbooks

Follow established incident management, escalation procedures, and service-level agreements (SLAs)

Collaborate with global teams across time zones to support operational initiatives and continuous improvement efforts

Contribute to process improvement initiatives and ensure adherence to documented policies, processes, and procedures

Qualification

Linux administrationGPU server deploymentInfiniBand networkingShell scriptingCluster bring-upNetworking fundamentalsRouterSwitch configurationOrganizational skillsCommunication skills

Required

Bachelor's degree in Computer Science, Engineering, Information Technology, or equivalent practical experience

Strong hands-on experience in Linux environments, including system administration, troubleshooting, and performance validation

Proficiency with Linux command-line tools and shell scripting (Bash or equivalent)

Experience with cluster bring-up, driver installation, and system-level configuration

Hands-on experience setting up and validating GPU servers in clustered environments

Experience with end-to-end GPU testing in InfiniBand-based clusters

Working knowledge of InfiniBand networking, including switch configuration and subnet management

Solid understanding of networking fundamentals, including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP)

Experience installing, configuring, and troubleshooting routers, switches, and terminal servers

Familiarity with fiber and copper cabling, including IP and SAN deployments

Experience managing incident tickets, maintaining acceptable ticket loads, and meeting SLAs

Strong organizational skills with meticulous attention to detail in data center environments

Ability to follow and enforce documented escalation procedures and operational policies

Strong verbal and written communication skills, with the ability to collaborate effectively with cross-functional and global teams

Preferred

Experience supporting HPC, AI, or large-scale GPU environments

Exposure to data center monitoring

Experience documenting operational processes and maintaining technical runbooks

Familiarity with large-scale data center buildouts or refresh programs

Company

Cadence

Glassdoor4.3

Cadence is a market leader in AI and digital twins, pioneering the application of computational software to accelerate innovation in the engineering design of silicon to systems.

Founded in 1988

San Jose, California, USA

10001+ employees

https://www.cadence.com

H1B Sponsorship

Cadence has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (306)

2024 (221)

2023 (282)

2022 (330)

2021 (233)

2020 (209)

Funding

Current Stage

Public Company

Total Funding

unknown

1998-02-20IPO

Leadership Team

Paul Cunningham

Senior Vice President and General Manager

Tom Beckley

Senior Vice President - Strategic Technology Programs

Recent News

Business Wire

Cadence Unveils Tensilica HiFi iQ DSP Purpose-Built for Next-Generation Voice AI and Audio Applications

2026-01-22

eeNews Europe

Cadence and partners line up for pre-validated chiplets

2026-01-17

GlobeNewswire

Die-to-Die IP Market Size to Surpass USD 3.72 Billion by 2033 | Research by SNS Insider

2026-01-16

Company data provided by crunchbase