Fleet Data Centers · 5 hours ago
Director, Critical Facilities Systems
Fleet Data Centers is a company that designs, builds and operates mega-scale data center campuses, addressing the demand for new Cloud and AI infrastructure. The Director of Critical Facilities Systems will oversee the operational command-and-control functions and digital systems, ensuring reliable and scalable operations while leading the Critical Facilities Operations Center and Network Operations Center.
Telecommunications
Responsibilities
Safety, security, and availability are the most important things we do. Help Fleet deliver near-perfect execution on these dimensions by building programs that are measurable, enforceable, and continuously improving
Own the 24/7 CFOC staffing model, training, qualification, and shift-lead structure; build a culture of calm, disciplined execution
Monitor mission-critical facility telemetry (BMS/EPMS/SCADA, DCIM, alarms, trends) and provide first-line triage, ticket creation, and dispatch/escalation to site teams
Maintain and continuously improve response playbooks, escalation paths, and communications protocols (including incident bridges and executive/customer notifications as applicable)
Capture high-quality incident timelines and evidence (telemetry snapshots, alarms, trends, logs) and provide an initial technical hypothesis to accelerate root cause analysis
Own alarm strategy governance: thresholds, suppression, correlation, tuning, and reduction of nuisance/false alarms in partnership with engineering and site leaders
Ensure operational readiness of monitoring for new sites and expansions (point lists, alarming, dashboards, runbooks, contacts, and handoff to steady-state operations)
Own the 24/7 NOC staffing, tooling, and procedures to monitor and triage connectivity issues for Fleet and customers
Receive, assess, and route network incidents and service requests; coordinate with internal network engineering, carriers, and vendors to drive rapid restoration
Establish customer-facing communications standards for network incidents (status updates, ETAs, post-incident summaries) in partnership with Customer teams
Maintain a disciplined process for outage tracking, incident documentation, and recurring-issue elimination through problem management
Ensure network monitoring coverage and accuracy (device inventory, alerting, dashboards, and escalation contacts) and support new site/phase turn-ups
Lead the team responsible for day-to-day administration, reliability, and lifecycle management of Fleet’s operational systems: DCIM/BMS/EPMS/SCADA, CMMS, ticketing/ITSM, and supporting reporting/analytics tools
Own user access governance, role-based permissions, auditability, and change control for operational tools (in alignment with Fleet’s security posture and IT controls)
Establish data standards and quality controls for asset registries, naming conventions, location hierarchy, alarm taxonomy, work order data, and ticket categorization to enable consistent reporting across sites
Manage vendor relationships, support contracts, SLAs, and roadmaps; translate operational needs into prioritized requirements and drive delivery with partners
Own system upgrades, patches, and enhancements—including testing, release management, training, and communications—to avoid downtime and user disruption
Drive integrations and automation between systems (e.g., alarms-to-tickets, CMMS-to-asset registry, dashboards/BI) to reduce manual work and increase response quality
Define and report KPIs for operations center performance and tool health (e.g., MTTA/MTTR, dispatch time, alarm volume and quality, ticket cycle times, tool uptime, and network SLOs)
Partner with site leaders and engineering to drive post-incident reviews, corrective actions, and recurring-issue reduction; ensure actions are tracked to closure
Identify systemic process or tooling gaps and build business cases for improvement, automation, and reliability enhancements
Support audits and compliance needs by ensuring operational data, logs, and evidence are retained, accessible, and consistent
Provide triage and support to site teams during events, be their eyes and ears, and own timely and accurate communications
Qualification
Required
10+ years of experience in mission-critical operations (data centers or similar critical infrastructure), including operations center / command center / NOC leadership
5+ years of people leadership experience, including building or scaling 24/7 shift-based teams (staffing, training, performance management, and accountability)
Strong working knowledge of critical facilities operations and telemetry, including BMS/EPMS/SCADA alarming and trends; ability to translate data into sound operational decisions
Working knowledge of network operations concepts (monitoring, triage, escalation, carrier/vendor coordination, and customer communications)
Hands-on experience owning and administering operational platforms such as DCIM/BMS, CMMS, and ticketing/ITSM systems; strong discipline in change control and data governance
Demonstrated incident management and root cause analysis skills; calm, clear-eyed execution in high-stakes, time-sensitive events
Strong cross-functional leadership and communication skills; able to align stakeholders across Operations, IT, Network Engineering, Security, Construction/Commissioning, and Customer teams
Willingness and ability to travel to Fleet sites as needed
Benefits
100% employer-covered medical, dental, and vision insurance
401K program
Standard paid holidays
Unlimited PTO
Company
Fleet Data Centers
Funding
Current Stage
Growth StageCompany data provided by crunchbase