Apply on Employer Site

Nebius · 9 hours ago

Data Center Site Manager

New Jersey, United States

Full-time

Onsite

Senior Level, Lead/Staff

$90K/yr - $140K/yr

10+ years exp

Nebius is leading a new era in cloud computing to serve the global AI economy. The Data Center Site Manager will own end-to-end reliability, safety, capacity, and performance for one of the flagship U.S. sites, leading a multi-disciplinary operations team to ensure world-class availability and cost efficiency.

AI InfrastructureCloud InfrastructureGPUIaaSPaaS

Growth Opportunities

Responsibilities

Own the site 24/7: deliver continuous availability across power, cooling, structured cabling, network, security, and DCIM—meeting or beating global SLAs

Build and lead the team: hire, mentor, and develop managers/technicians; run staffing models, shift coverage, and on‑call rotations that scale

Be the incident commander: lead major events end‑to‑end—triage, communications, executive briefings, RCA, and durable corrective actions

Drive reliability engineering: implement RCM, predictive maintenance, QA/QC, 5S, and Lean/continuous improvement to cut MTTR and raise MTBF

Deliver capacity on time: plan and execute expansions/retrofits; commission MEP systems with Design/Construction; achieve flawless change control (MOP/SOP/EOP)

Scale tooling & automation: mature DCIM/BMS/EPMS, monitoring/alerting, work management (Jira/ServiceNow), knowledge base (Confluence), and light scripting/SQL for telemetry and workflow automation

Run a metrics‑first operation: publish dashboards and KPIs (availability, PUE, MTBF/MTTR, work compliance, safety) and use them to drive decisions

Partner across functions: work with Cloud/Compute, Network, Security, and Capacity Planning to optimize performance, cost, and resiliency across the fleet

Manage vendors & colos: own contracts, SLAs, and execution for rack deliveries, PDUs, fiber/copper, and lifecycle PMs; validate colo topology and compliance

Raise the safety bar: enforce a zero‑injury EHS culture; conduct drills/audits for life safety, physical security, and data protection

Forecast and budget: build data‑backed plans for power, spares, headcount, and projects; track OpEx/CapEx with rigor

Qualification

Data Center ManagementIncident ManagementReliability EngineeringTeam LeadershipElectrical EngineeringMechanical EngineeringHVAC SystemsData AnalysisVendor ManagementChange ManagementSafety ManagementBudgetingLean/Six SigmaSQLJiraConfluenceServiceNowCommunication Skills

Required

Associate's degree or trade certification in Electrical/Mechanical/Industrial Engineering (or equivalent experience)

10+ years in electrical/mechanical/HVAC/controls within industrial/commercial settings, 5+ years specifically in data center or mission-critical facilities

Team leadership experience in 24/7 sites (managing leads/techs, vendors, and on-call operations)

Deep, hands-on knowledge of UPS/generators/switchgear, chillers/CRAC/CRAH, fire detection/suppression, BMS/EPMS/DCIM, and structured cabling (copper & fiber)

Proven strength in incident management, RCA/Corrective Actions, change management, and vendor/contract oversight

Data-driven mindset with the ability to forecast resources and make analytics-backed decisions (Excel; SQL/scripting a plus)

Excellent written/verbal communication with comfort presenting to executives and guiding field teams during live events

Ability to travel up to ~30% and support after-hours escalations when needed

Preferred

Bachelor's degree in Electrical/Mechanical/Industrial Engineering, Engineering Management, or Reliability Engineering

Hyperscale/colo experience with reliability-centered maintenance, predictive analytics, and Lean/Six Sigma practices

Familiarity with Linux fundamentals, network equipment installation/troubleshooting, and fiber optics testing

Experience with Jira, Confluence, ServiceNow (or similar); strong SOP/MOP/EOP authorship

Certifications such as CDCP, DCM, PMP, OSHA-30, ITIL, or Uptime-aligned credentials

Benefits

Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.

401(k) plan: up to 4% company match with immediate vesting.

Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.

Remote work reimbursement: up to $85/month for mobile and internet.

Disability & life insurance: company-paid short-term, long-term and life insurance coverage.

Company

Nebius

The Nebius AI Cloud brings powerful full-stack infrastructure for AI developers and practitioners across startups, enterprises and science institutes to build and deploy generative AI applications and rapidly deliver scientific breakthroughs by training and running ML models within a secure, high-performance, and cost-optimized cloud environment.

Founded in 2022

Amsterdam, Noord-Holland, NLD

501-1000 employees

https://nebius.com/

Funding

Current Stage

Late Stage

Total Funding

$1.04B

2025-06-04Debt Financing· $1B

2025-05-15Grant· $45M

2024-12-02Seed

Leadership Team

Evan Helda

Head of Physical AI

Vinita Ananth

Sr. Director of Product

Recent News

Business Wire

Nebius Debuts the Robotics & Physical AI Awards and Summit to Support Next-Generation Startups With $1.5 Million in Compute Credits

2025-12-10

GeekWire

Tech Moves: Expedia names first AI chief; Textio founder joins Microsoft; T-Mobile exec departs

2025-12-02

WebProNews

Applied Digital’s $5B AI Lease: North Dakota’s Compute Boom

2025-10-25

Company data provided by crunchbase