TensorWave · 1 month ago
Senior Network Engineer
TensorWave is dedicated to building seamless, secure, reliable, and resilient AI infrastructure at scale. The Senior Network Engineer will focus on implementing and operating large-scale, Arista-based RoCEv2 data center networks, playing a critical role in the maintenance and design of next generation systems.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureGenerative AIIaaS
Responsibilities
Design, deploy, and operate large-scale RoCEv2 data center networks supporting AI and ML clusters from thousands to 100,000+ GPUs
Own congestion management and performance tuning across RDMA fabrics, including PFC, ECN, and DCQCN, in production environments
Implement and maintain automation, validation, and observability tooling using Python, Ansible, Terraform, and modern DevOps workflows
Ensure high availability and reliability across multi-tenant environments by leading operational excellence, incident response, and continuous improvement
Qualification
Required
Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
Deep experience with RDMA and RoCEv2 in large-scale production data centers supporting AI or HPC workloads
Strong Arista expertise, including EOS, hardware platforms, and operating high-speed Ethernet fabrics
Proven knowledge of congestion management and performance tuning using PFC, ECN, and DCQCN
Hands-on experience with high-speed optics and cabling including 400G, 800G, and AEC, AOC, DAC, and structured cabling in dense environments
Automation and operations mindset, with experience using Python, Ansible, Terraform, Git, and observability tooling in always-on production systems
Benefits
100% paid Medical, Dental, and Vision insurance
Flexible PTO
Paid Holidays
401(k)
Parental Leave
Flexible Spending Account
Short Term Disability Insurance
Life and Voluntary Supplemental Insurance
Mental Health Benefits through Spring Health
Company
TensorWave
TensorWave is an AMD GPU exclusive Cloud that supports training and inference at scale
Funding
Current Stage
Growth StageTotal Funding
$146.71MKey Investors
Nexus Venture PartnersFundNV
2025-05-14Series A· $100M
2024-10-08Seed· $43M
2024-04-23Seed· $0.89M
Recent News
ReviewJournal
2025-12-19
2025-11-05
Company data provided by crunchbase