Principal Software Engineer - Networking jobs in United States
cer-icon
Apply on Employer Site
company-logo

San Francisco Compute Company · 2 months ago

Principal Software Engineer - Networking

SF Compute is focused on revolutionizing the commodity compute market by creating a trading venue for compute contracts. The Principal Software Engineer - Networking will design and operate infrastructure for GPU clusters, emphasizing system software and distributed automation to ensure seamless integration of networking and compute at scale.

Information TechnologyInternet
check
H1B Sponsorednote

Responsibilities

Design and operate orchestration frameworks to manage tens of thousands of GPUs across Kubernetes, virtualization, and bare metal
Develop automation frameworks for large-scale provisioning, monitoring, and fault tolerance
Build distributed systems that can withstand node or cluster-wide failures
Architect software-defined networking solutions that integrate with underlay switches and support scalable designs
Collaborate with networking specialists to ensure fabric resilience, low latency, and scalability, leveraging routing protocols like BGP where needed
Integrate high-performance distributed storage with compute and networking layers

Qualification

Distributed systemsSoftware-defined networkingAutomation frameworksLinux internalsNetworking protocolsGPU/HPC clustersScripting skillsGoRustHigh-performance storageDocumentation skills

Required

Strong software engineering background, with experience building fault-tolerant distributed systems
Comfortable with Linux internals, debugging, and performance optimization
Exposure to GPU/HPC clusters
Networking literacy: familiar with eBGP, VXLAN, RoCEv2, and InfiniBand, plus an understanding of how to design software systems that dynamically leverage these fabrics
Strong automation, scripting, and documentation skills
Design and operate orchestration frameworks to manage tens of thousands of GPUs across Kubernetes, virtualization, and bare metal
Develop automation frameworks for large-scale provisioning, monitoring, and fault tolerance
Build distributed systems that can withstand node or cluster-wide failures
Architect software-defined networking solutions that integrate with underlay switches and support scalable designs
Collaborate with networking specialists to ensure fabric resilience, low latency, and scalability, leveraging routing protocols like BGP where needed
Integrate high-performance distributed storage with compute and networking layers

Preferred

Go or Rust experience (3+ years)
Deep knowledge of HPC fabrics (InfiniBand, Ultra Ethernet, RoCEv2)
Experience with high-performance storage (WEKA, VAST, Ceph, etc.)
Prior exposure to global distributed compute operations

Benefits

GENEROUS EQUITY GRANT
VISA SPONSORSHIPS
RETIREMENT MATCHING
MEDICAL, DENTAL & VISION
TIME OFF
PARENTAL LEAVE
DAILY LUNCH
UNLIMITED OFFICE BOOK BUDGET

Company

San Francisco Compute Company

twittertwittertwitter
company-logo
Compute is a commodity. We think people should buy it like one.

Funding

Current Stage
Early Stage
Total Funding
$52M
Key Investors
Altman Capital
2025-11-26Series A· $40M
2024-07-16Series Unknown· $12M
Company data provided by crunchbase