Fleet Data Centers · 18 hours ago
Senior Reliability Engineer
Fleet Data Centers designs, builds, and operates mega-scale data center campuses. The Senior Availability / Reliability Engineer leads availability modeling, reliability analysis, and mitigation planning for Fleet’s behind-the-meter power solutions, collaborating with various teams to ensure designs meet target availability and resiliency.
Data CenterData ManagementIT Infrastructure
Responsibilities
Own availability and reliability analysis for BTM power solutions across a variety of technologies such as gas reciprocating engines, turbines, fuel cells, batteries, and site deployments (fault trees, reliability block diagrams, Monte Carlo or scenario modeling as appropriate)
Define availability targets and performance assumptions; align with customer requirements and Fleet’s uptime objectives
Identify single points of failure and operational risks; recommend design, controls, procedural, or spares mitigations
Partner with engineering teams to validate redundancy strategies, maintainability, and test/maintenance windows that preserve service availability
Support commissioning readiness by defining test scenarios and success criteria that validate reliability assumptions
Develop Quality Control KPI definitions and reporting for reliability performance (forced outage rate, MTTR, maintenance compliance) and drive continuous improvement
Run Reliasoft or IEEE Goldbook calculations to demonstrate facility uptime based on selection of generation and distribution equipment
Lead root cause analysis and corrective action tracking for reliability-impacting events; ensure lessons learned feed back into standards and roadmaps
Collaborate with vendors and operations teams on maintenance strategies, spares/critical parts planning, and reliability-centered maintenance principles
Qualification
Required
Bachelor's degree in Engineering (Electrical, Mechanical, Industrial, or similar)
7+ years in reliability engineering, availability analysis, quality processes and/or asset performance engineering in mission-critical or industrial environments
Own availability and reliability analysis for BTM power solutions across a variety of technologies such as gas reciprocating engines, turbines, fuel cells, batteries, and site deployments (fault trees, reliability block diagrams, Monte Carlo or scenario modeling as appropriate)
Define availability targets and performance assumptions; align with customer requirements and Fleet's uptime objectives
Identify single points of failure and operational risks; recommend design, controls, procedural, or spares mitigations
Partner with engineering teams to validate redundancy strategies, maintainability, and test/maintenance windows that preserve service availability
Support commissioning readiness by defining test scenarios and success criteria that validate reliability assumptions
Develop Quality Control KPI definitions and reporting for reliability performance (forced outage rate, MTTR, maintenance compliance) and drive continuous improvement
Run Reliasoft or IEEE Goldbook calculations to demonstrate facility uptime based on selection of generation and distribution equipment
Lead root cause analysis and corrective action tracking for reliability-impacting events; ensure lessons learned feed back into standards and roadmaps
Collaborate with vendors and operations teams on maintenance strategies, spares/critical parts planning, and reliability-centered maintenance principles
Integrity and Ethical Standards: Build trust, ensure fairness, and foster long-term, transparent relationships with suppliers
Effective Communication: The ability to clearly convey expectations and requirements to suppliers and negotiation parties, while understanding their needs and concerns. Comfortable delivering written and verbal presentations to internal leadership teams
Emotional Intelligence (EQ): Ability to understand the emotions, cultural nuances, and motivations of others, while effectively managing one's own emotions during high-pressure negotiations
Strategic Thinking: Recognize how supplier relationships and negotiations align with the broader organizational goals, while aiming for outcomes that benefit both parties
Critical Thinking Skills: Finding innovative solutions and being flexible in addressing unexpected challenges
Analytical Ability: Make data-driven decisions, assess cost structures, and identify potential risks, ensuring informed and strategic outcomes
Influence and Persuasion: Able to effectively advocate for their position, build consensus, and secure favorable agreements without compromising relationships
Operational Paranoia: Anticipate risks, identify vulnerabilities, and proactively implement mechanisms to prevent and minimize disruptions and safeguard safety, security, availability, and scale
Relationship Management: Cultivate trust, collaboration, and long-term partnerships, while building a broad network that provides valuable benchmarking, industry insights, and alternative sourcing options
Preferred
Experience with generation assets and integration into critical electrical systems
Experience with CMMS data, failure coding, and maintenance program optimization
Familiarity with safety and operating discipline (MOP/SOP/EOP, change management, incident response)
Experience with ISO and Quality metrics for generation assets
Experience communicating technical risk to executives and customers
Benefits
100% employer-covered medical, dental, and vision insurance
A 401K program
Standard paid holidays
Unlimited PTO
Company
Fleet Data Centers
Fleet Data Centers is a data infrastructure company that designs, constructs, and operates mega-scale data centers.
Funding
Current Stage
Growth StageCompany data provided by crunchbase