Solutions Engineer - AI/HPC Infrastructure jobs in United States
cer-icon
Apply on Employer Site
company-logo

DriveNets · 3 months ago

Solutions Engineer - AI/HPC Infrastructure

DriveNets is a leader in disaggregated high-scale networking solutions for service providers and AI infrastructures. The Solutions Engineer will design, deploy, and optimize Drivenets’ Network Cloud AI Infrastructure solutions, collaborating with various teams to ensure successful deployment and alignment with customer business needs.

Cloud Data ServicesCloud InfrastructureNetwork HardwareSoftware
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Building robust AI/HPC infrastructure for new and existing customers
Technical hands-on role in building and supporting NVIDIA/AMD based platforms
Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting
Administer Linux systems, ranging from powerful GPU-enabled servers to general-purpose compute systems
Design and plan rack layouts and network topologies to support customer requirements
Design and evaluate automation scripts for network operations, configuring server and switch fabrics
Perform NCCL, RCCL, LLM, and RDMA performance benchmarks as part of the design and evaluation process of the deployment
Benchmark the latest GPU compute and NIC solutions by all major compute vendors, over the DriveNets networking fabric
Install and configure Drivenets products, ensuring optimal performance and customer satisfaction
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement
Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements
Introduce new products to the Drivenets’ sales and support teams and to Drivenets’ customers
Deliver technical trainings and TOIs for support/sales engineers, partners, and customers
Collaborate on product definition through customer requirement gathering and roadmap planning

Qualification

AI/HPC clustersLinux administrationCloud technologiesBashPythonConfiguration managementNetworking technologiesMonitoring toolsAI/ML frameworksTechnical writing

Required

5+ years of previous experience deploying and administering AI/HPC clusters or general-purpose compute systems
5+ years of hands-on Linux experience (e.g., RHEL, CentOS, Ubuntu) and production infrastructure support (e.g., networking, storage, monitoring, compute, installation, configuration, maintenance, upgrade, retirement)
Proficiency in Cloud, Virtualization, and Container technologies
Deep understanding of operating systems, computer networks, and high-performance applications
Hands-on experience with Bash, Python, and configuration management tools (e.g., Ansible)
Established record of leading technical initiatives and delivering results
Ability to write extensive technical content (white papers, technical briefs, test reports, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging
Ability to travel domestic and international

Preferred

Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, DPU, etc
Familiarity with GPU resource scheduling managers (Slurm, Kubernetes, etc.)
Expertise with NCCL/RCCL, setting up GPU environments, tuning these environments, and collecting benchmark results
Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc)
Understanding of data center operations fundamentals in networking, cooling, and power
Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP or OCI) or emerging Neoclouds, and cloud-native architectures and software
Understanding the AI workload requirements and how it interacts with other parts of the system like networking, storage, deep learning frameworks, etc
Knowledge of AI/ML frameworks (e.g., TensorFlow, PyTorch) and associated tooling is an advantage

Company

DriveNets

twittertwittertwitter
company-logo
DriveNets is a networking company that provides software-based routing solutions.

H1B Sponsorship

DriveNets has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2024 (4)
2023 (2)
2022 (4)
2021 (3)
2020 (4)

Funding

Current Stage
Growth Stage
Total Funding
$1.24B
Key Investors
AT&TD2 InvestmentsD1 Capital Partners
2025-07-17Secondary Market· $650M
2022-08-17Series C· $262M
2021-01-27Series B· $208M

Leadership Team

leader-logo
Ido Susan
Co-Founder and CEO
linkedin
leader-logo
Inbar Lasser-Raab
Chief Marketing Officer
linkedin
Company data provided by crunchbase