Fleet Data Centers · 16 hours ago
Sr. Systems Engineer – Mechanical
Fleet Data Centers designs, builds, and operates mega-scale data center campuses, and they are seeking a Senior Systems Engineer – Mechanical to oversee the design validation and optimization of data center cooling systems. The role involves ensuring the integrity of thermal systems and collaborating with cross-functional teams to enhance operational efficiency and uptime.
Data CenterData ManagementIT Infrastructure
Responsibilities
Develop and maintain a deep understanding of Fleet data center cooling topology, including:
Air-side systems: fan walls, CRAHs/CRACs, air handlers, ducting, containment, filters
Liquid-side systems: chillers, dry coolers, pumps, CDUs, heat exchangers, headers/manifolds, valve trains
Rack-level solutions: liquid-cooled cold plates, rear-door heat exchangers, in-rack manifolds, hybrid air/liquid configurations
Determine the air-to-liquid mix needed to support a given rack layout, accounting for:
Rack SKUs and thermal design power per rack/cluster
Liquid-cooled vs. air-cooled SKUs and their specific inlet temperature, flow, and ΔT requirements
Aisle-level and room-level constraints (supply/return temperatures, pressure, containment)
Ensure that for each deployment:
The selected cooling topology supports the planned rack densities and layouts
Air and liquid paths are balanced to avoid local under-supply or over-supply conditions
Design assumptions are documented and traceable back to rack SKUs and IT deployment plans
Understand the air and liquid cooling requirements for each rack SKU, including:
Inlet temperature and humidity ranges
Liquid flow, pressure, and temperature ranges for cold plates and rear-door heat exchangers
Allowable gradients across the rack and between front/rear or supply/return
Maintain a structured mapping between rack SKUs and required cooling configuration, including:
Airflow requirements per rack and per aisle
Liquid flow per rack, per manifold, and per loop
Any special constraints (e.g., high ΔT, mixed air/liquid in same aisle, hot aisle / cold aisle rules)
Ensure all cooling-related specifications and quantities (fan wall modules, CRAHs/CRACs, CDUs, pumps, valves, manifolds, piping sizes, coil sizes) are:
Accurate and complete
Captured in standardized BOMs and drawings
Provided to capacity planners and procurement with enough detail to plan and procure infrastructure
Perform CFD analysis at room and aisle level to:
Validate that planned rack placement does not create hot spots
Confirm that airflow patterns, pressure profiles, and temperature distributions are within allowable limits
Identify and mitigate cooling stranding, where cooling capacity exists but cannot be effectively delivered to IT load because of placement or topology
Use CFD and thermal modeling tools to:
Evaluate different rack arrangements and containment strategies
Test sensitivity to changes in IT load, fan speeds, supply temperatures, and air-to-liquid mix
Quantify margin to thresholds (e.g., maximum rack inlet temperature, maximum component temperatures)
Translate CFD results into actionable design rules, placement constraints, and deployment guidelines for capacity planners and operations
Optimize cooling for each aisle based on:
Actual and forecasted IT load distribution
Air-to-liquid split for the racks in that aisle
Containment strategy (cold aisle, hot aisle, full containment, partial containment)
Recommend fan wall octet configurations (and other fan wall module configurations) per deployment to:
Meet airflow and pressure requirements for current and planned density
Maintain redundancy and margin for failure and maintenance scenarios
Minimize fan energy use while preserving required thermal headroom
Work with operations to tune setpoints (supply temperature, fan speeds, differential pressure, chilled water temperatures, etc.) in a way that:
Supports uptime SLAs
Minimizes cooling stranding and unnecessary over-provisioning
Maintains or improves site PUE
Conduct failure mode simulations and analyses for mechanical systems, including at minimum:
CRAC/CRAH outage scenarios (single unit or multiple simultaneous failures)
Dry cooler outage or degraded performance scenarios
Pump failures, valve failures, and partial loss of liquid loops
Loss of containment or unplanned bypass conditions
For each scenario, evaluate:
Transient and steady-state temperature excursions at the rack and component level
Time-to-threshold (how long before violating safe temperature limits)
Impact on redundancy, load shedding requirements, and achievable uptime
Use results to:
Recommend design improvements (additional redundancy, loop segmentation, capacity rebalancing)
Define operational responses and MOPs (e.g., load shedding priorities, setpoint changes)
Optimize uptime SLAs while minimizing cooling stranding, especially in mixed air/liquid deployments and high-density aisles
Lead or support infrastructure upgrades and expansion impact analyses for cooling systems, including:
Adding or resizing fan walls, CRAHs/CRACs, dry coolers, chillers, pumps, CDUs, and distribution headers
Increasing liquid cooling fraction as AI-heavy racks grow in share
Changing setpoints or operating modes (e.g., different supply temperatures, economization strategies)
Quantify for proposed changes:
Effect on current and future thermal capacity and headroom
Changes in aisle-level and room-level airflow / liquid flow distribution
Impact on PUE, water usage, and operating costs
Provide mechanical engineering input into MOPs and risk assessments for any cooling system change that could impact live IT load
Partner with capacity planners, rack design teams, site operations, facilities engineering, and procurement to ensure:
Cooling design and capacity assumptions are aligned with rack deployment plans and SLAs
Air-to-liquid decisions are integrated into forecast models and program timelines
Mechanical constraints are visible and respected in planning and operations
Produce and maintain clear design guides, reference one-lines, piping schematics, and airflow diagrams for:
Standard Fleet cooling topologies and variants
Site-specific implementations and exceptions
High-density / AI-specific deployments
Contribute mechanical content to internal standards and playbooks covering:
Cooling topology design rules
CFD analysis methodologies and acceptance criteria
Failure mode simulation procedures and reporting standards
Support design reviews, incident reviews, and vendor evaluations from a mechanical systems standpoint
Qualification
Required
Bachelor's degree in Mechanical Engineering or a closely related engineering discipline
6+ years of experience in data center mechanical engineering, mission-critical HVAC design, or thermal systems engineering for large industrial or technology facilities
Demonstrated deep understanding of data center cooling topologies, including both air-cooled and liquid-cooled architectures (fan walls, CRAHs/CRACs, chillers, dry coolers, pumps, heat exchangers, CDUs, manifolds, containment systems)
Hands-on experience performing and interpreting CFD analysis for data halls or similar mission-critical environments, with a track record of using CFD results to drive design changes and rack placement decisions
Proven ability to determine appropriate air-to-liquid mix for given rack layouts and densities
Proven ability to assess and optimize thermal performance at rack, aisle, and room levels
Proven ability to identify and remediate hot spots and cooling stranding
Experience designing or analyzing failure modes for cooling systems (e.g., CRAC/CRAH outage, dry cooler/chiller degradation, pump or valve failures) and translating results into design and operational mitigations
Strong analytical and problem-solving skills, with the ability to connect thermal and mechanical design decisions to uptime, SLA performance, and site efficiency (PUE, water usage)
Clear written and verbal communication skills, including the ability to document complex cooling concepts and present analyses to engineering and operations stakeholders
Preferred
Experience in hyperscale or colocation data centers, especially supporting high-density AI/GPU clusters and advanced liquid cooling (direct-to-chip, rear-door heat exchangers, in-rack manifolds)
Proficiency with industry-standard CFD and thermal analysis tools and familiarity with integrating results into DCIM/BMS or capacity planning workflows
Familiarity with data center efficiency metrics (e.g., PUE, WUE) and how cooling design decisions influence them
Experience with DCIM, BMS, and monitoring systems for tracking and optimizing thermal performance in production environments
Knowledge of relevant mechanical and building codes and standards as applied to mission-critical facilities
Prior experience conducting infrastructure upgrade or expansion impact analyses in live data centers, including development of MOPs and risk mitigations
Benefits
100% employer-covered medical, dental, and vision insurance
401K program
Standard paid holidays
Unlimited PTO
Company
Fleet Data Centers
Fleet Data Centers is a data infrastructure company that designs, constructs, and operates mega-scale data centers.
Funding
Current Stage
Growth StageCompany data provided by crunchbase