Network Engineer, Operations & Repair jobs in United States
cer-icon
Apply on Employer Site
company-logo

Fluidstack ยท 1 week ago

Network Engineer, Operations & Repair

Fluidstack is building the infrastructure for abundant intelligence, partnering with top AI labs and enterprises. They are seeking a Network Engineer, Operations & Repair to ensure network reliability through incident response and operational excellence, while managing regional operations within a datacenter campus.

Cloud ComputingCloud StorageGenerative AIGPUInformation TechnologyMachine LearningPrivate CloudSoftware
check
H1B Sponsor Likelynote

Responsibilities

Serve as the primary network operations contact for a datacenter region
Own network health, respond to incidents escalated from NOC, and ensure fabrics run reliably
Build deep knowledge of your region's network topology, common failure modes, and operational characteristics
Handle network incidents escalated from Tier 1 NOC during your coverage window
Troubleshoot complex issues across physical and logical layers, coordinate with other engineers for follow-the-sun coverage, and drive incidents to resolution
Lead incident response when you're the subject matter expert
Coordinate with hardware repair teams onsite for incidents escalated and assigned
Support RMA case process and escalations with supplier support teams
Build and support dashboards per region and multi-region aggregate observability
Manage field testing of repair and other operations process and automation; providing visibility and feedback to partners developing the tooling
Provide operational support for datacenter deployments and expansions in your region
Partner with Deployment teams on turn-up activities, validate production readiness, and ensure smooth handovers from deployment to operations
Build and execute operational runbooks for both repair and non-repair activities
Identify gaps in runbooks, document lessons learned, and provide feedback to the Operations lead on runbook improvements
Build relationships with onsite DC Operations teams, structured cabling vendors, and hardware logistics partners
Serve as the network engineering liaison for your datacenter region
Communicate clearly about network status, planned maintenance, and operational issues

Qualification

Network EngineeringIncident ResponseDatacenter Fabric ExpertiseSQLGrafanaPythonOperational PragmatismMatrix LeadershipHybrid Work ComfortAutomation ExposureCross-Team Collaboration

Required

5-8 years in network engineering with significant hands-on operational experience
Experience running production networks, responding to incidents at all hours, and debugging complex failures under pressure
Understanding of the difference between 'working' and 'production-ready.'
Basic SQL and dashboard experience with Grafana, Tableau, or similar query/dashboard services
Basic python3 with jupyter notebooks or scripts
Deep experience operating modern datacenter networks including EVPN/VXLAN, BGP, CLOS topologies, and high-radix switching
Comfortable troubleshooting Layer 2/3 issues, BGP routing problems, fabric misconfigurations, and physical media failures
Proven ability to lead incident response, perform systematic troubleshooting, and drive issues to resolution
Ability to remain calm during outages, communicate clearly with stakeholders, and know when to escalate versus when to dig deeper
Experience building relationships with onsite teams, coordinating physical infrastructure work, and representing network engineering in a field environment
Ability to balance perfection with progress, troubleshoot with imperfect information, make pragmatic decisions under time pressure, and prioritize based on business impact
Comfortable working remotely but understand that datacenter operations sometimes require hands-on presence
Comfortable with 30-40% travel and flexible schedules that adapt to operational needs

Preferred

Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high-performance networking
Experience as a site lead, campus engineer, or regional operations lead
Hands-on experience coordinating hardware repairs, RMAs, and physical infrastructure work
Familiarity with network monitoring platforms, alerting systems, and telemetry collection
Experience with SQL, MySQL, and building operations dashboards
Basic scripting or automation experience (Python, Ansible) for operational tasks
Experience working in distributed operations teams with follow-the-sun coverage models

Benefits

Retirement or pension plan, in line with local norms.
Health, dental, and vision insurance.
Generous PTO policy, in line with local norms.

Company

Fluidstack

twittertwittertwitter
company-logo
FluidStack is an AI cloud platform for frontier labs and startups.

H1B Sponsorship

Fluidstack has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (2)

Funding

Current Stage
Growth Stage
Total Funding
unknown
Key Investors
Seedcamp
2025-06-01Undisclosed
2024-10-01Private Equity
2018-02-01Pre Seed

Leadership Team

leader-logo
Gary Wu
CEO, Co-Founder
linkedin
Company data provided by crunchbase