Vultr · 1 month ago
Network DevOps Engineer, RDMA Fabric Automation
Vultr is a high-performance cloud infrastructure company that aims to make cloud solutions accessible for enterprises and AI innovators globally. The Network DevOps Engineer will be responsible for automating deployment and operations of large-scale RDMA Ethernet fabrics, building frameworks for network provisioning, and collaborating with engineering teams to optimize performance.
Artificial Intelligence (AI)Cloud ComputingCloud InfrastructureCloud StorageWeb Hosting
Responsibilities
Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers
Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks
Integrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation
Develop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics
Collaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics
Implement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacks
Investigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactions
Contribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture
Qualification
Required
Solid understanding of modern data center networking: EVPN-VXLAN, BGP, MLAG, QoS, and traffic engineering
Deep familiarity with RoCEv2, RDMA transport tuning, ECN/PFC, and lossless Ethernet design
Strong experience with automation frameworks like Ansible, and languages like Python, Golang, Rust, or PHP
Comfort working with telemetry and monitoring stacks — Prometheus, Grafana, Loki, ELK, or similar
Previous experience integrating with NetBox, Nautobot, OpsMill or similar for topology and configuration source-of-truth
Familiarity with CI/CD systems (GitHub Actions, Jenkins, ArgoCD) for continuous delivery of network automation
Strong Linux networking background, including namespaces, netlink, and system-level debugging
Benefits
100% company-paid insurance premiums for employee medical, dental and vision plans.
401(k) plan that matches 100% up to 4%, with immediate vesting
Professional Development Reimbursement of $2,500 each year
11 Holidays + Paid Time Off Accrual + Rollover Plan
Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
$500 stipend for remote office setup in first year + $400 each following year
Internet reimbursement up to $75 per month
Gym membership reimbursement up to $50 per month
Company paid Wellable subscription
Company
Vultr
Vultr is an AI cloud infrastructure platform offering latest generation NVIDIA GPUs and AMD CPUs and GPUs across 32 worldwide regions
H1B Sponsorship
Vultr has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (1)
Funding
Current Stage
Growth StageTotal Funding
$662M2025-06-23Debt Financing· $329M
2024-12-18Private Equity· $333M
2014-02-20Angel
Recent News
Inside HPC & AI News | High-Performance Computing & Artificial Intelligence
2025-12-06
2025-12-02
Company data provided by crunchbase