Fluidstack ยท 1 week ago
Network Engineer, Operations & Reliability
Fluidstack is building the infrastructure for abundant intelligence and is seeking a Network Operations Engineer to serve as a Regional Site Lead for one of their data center campuses. The role combines hands-on Tier 2/3 network operations with site leadership responsibilities, ensuring network reliability and operational excellence.
Cloud ComputingCloud StorageGenerative AIGPUInformation TechnologyMachine LearningPrivate CloudSoftware
Responsibilities
Serve as the primary network operations contact for your assigned datacenter campus
Own network health, respond to incidents escalated from NOC, and ensure fabrics run reliably
Build deep knowledge of your region's network topology, common failure modes, and operational characteristics
Handle network incidents escalated from Tier 1 NOC during your coverage window
Troubleshoot complex issues across physical and logical layers, coordinate with other engineers for follow-the-sun coverage, and drive incidents to resolution
Lead incident response when you're the subject matter expert on the ground
Coordinate hardware break-fix activities with onsite DC Operations technicians
Manage linecard swaps, optic replacements, device troubleshooting, and RMA processes
Ensure physical infrastructure issues are resolved quickly and don't impact production workloads
Provide operational support during new datacenter deployments and expansions in your region
Partner with Deployment teams on turn-up activities, validate production readiness, and ensure smooth handovers from deployment to operations
Be the person who ensures new pods integrate seamlessly into operational workflows
Execute operational runbooks for common failure scenarios and maintenance procedures
Identify gaps in runbooks, document lessons learned, and provide feedback to the Operations pillar lead on runbook improvements
Build the operational knowledge base for your region
Build strong relationships with onsite DC Operations teams, structured cabling vendors, and hardware logistics partners
Serve as the network engineering liaison for your datacenter campus
Communicate clearly about network status, planned maintenance, and operational issues
As the regional team scales, mentor junior operations engineers assigned to your datacenter
Share operational knowledge, provide guidance during incidents, and help build regional operations capacity
Qualification
Required
Strong Operations Background: 5-8 years in network engineering with significant hands-on operational experience. You've run production networks, responded to incidents at all hours, and debugged complex failures under pressure. You understand the difference between 'working' and 'production-ready.'
Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN, BGP, CLOS topologies, and high-radix switching. You're comfortable troubleshooting Layer 2/3 issues, BGP routing problems, fabric misconfigurations, and physical layer failures
Incident Response Excellence: Proven ability to lead incident response, perform systematic troubleshooting, and drive issues to resolution. You remain calm during outages, communicate clearly with stakeholders, and know when to escalate versus when to dig deeper. You've been the person others call when things break
Site Leadership Capability: You've been the go-to network person for a site, datacenter, or region before. You understand how to build relationships with onsite teams, coordinate physical infrastructure work, and represent network engineering in a field environment. You know how to get things done in operational settings
Operational Pragmatism: You balance perfection with progress. You can troubleshoot with imperfect information, make pragmatic decisions under time pressure, and prioritize based on business impact. You document as you go and continuously improve operational processes
Hybrid Work Comfort: You're productive working remotely but understand that datacenter operations sometimes require hands-on presence. You're comfortable with flexible schedules that adapt to operational needs - sometimes remote, sometimes onsite for days or weeks during critical periods
Preferred
AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion
Regional/Campus Operations Leadership: You've been a site lead, campus engineer, or regional operations lead before. You know how to coordinate across teams in a specific geographic location while reporting into a centralized organization
Hardware Break-Fix Experience: Hands-on experience coordinating hardware repairs, RMAs, and physical infrastructure work. You understand datacenter logistics, vendor escalation processes, and how to work effectively with onsite technicians
Observability & Monitoring: Familiarity with network monitoring platforms, alerting systems, and telemetry collection. You've used monitoring tools to diagnose issues proactively and tune alerting to reduce noise
Automation Exposure: Basic scripting or automation experience (Python, Ansible) for operational tasks. You may not be writing complex automation but you understand how to leverage tools to improve operational efficiency
Follow-the-Sun Experience: Experience working in distributed operations teams with follow-the-sun coverage models. You understand how to hand off incidents cleanly, communicate operational status across time zones, and coordinate with global teams
Benefits
Retirement or pension plan, in line with local norms.
Health, dental, and vision insurance.
Generous PTO policy, in line with local norms.
Company
Fluidstack
FluidStack is an AI cloud platform for frontier labs and startups.
H1B Sponsorship
Fluidstack has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (2)
Funding
Current Stage
Growth StageTotal Funding
unknownKey Investors
Seedcamp
2025-06-01Undisclosed
2024-10-01Private Equity
2018-02-01Pre Seed
Recent News
2026-01-07
2026-01-06
2026-01-04
Company data provided by crunchbase