San Francisco Compute Company · 2 months ago
Principal Software Engineer - Networking
SF Compute is focused on revolutionizing the commodity compute market by creating a trading venue for compute contracts. The Principal Software Engineer - Networking will design and operate infrastructure for GPU clusters, emphasizing system software and distributed automation to ensure seamless integration of networking and compute at scale.
Information TechnologyInternet
Responsibilities
Design and operate orchestration frameworks to manage tens of thousands of GPUs across Kubernetes, virtualization, and bare metal
Develop automation frameworks for large-scale provisioning, monitoring, and fault tolerance
Build distributed systems that can withstand node or cluster-wide failures
Architect software-defined networking solutions that integrate with underlay switches and support scalable designs
Collaborate with networking specialists to ensure fabric resilience, low latency, and scalability, leveraging routing protocols like BGP where needed
Integrate high-performance distributed storage with compute and networking layers
Qualification
Required
Strong software engineering background, with experience building fault-tolerant distributed systems
Comfortable with Linux internals, debugging, and performance optimization
Exposure to GPU/HPC clusters
Networking literacy: familiar with eBGP, VXLAN, RoCEv2, and InfiniBand, plus an understanding of how to design software systems that dynamically leverage these fabrics
Strong automation, scripting, and documentation skills
Design and operate orchestration frameworks to manage tens of thousands of GPUs across Kubernetes, virtualization, and bare metal
Develop automation frameworks for large-scale provisioning, monitoring, and fault tolerance
Build distributed systems that can withstand node or cluster-wide failures
Architect software-defined networking solutions that integrate with underlay switches and support scalable designs
Collaborate with networking specialists to ensure fabric resilience, low latency, and scalability, leveraging routing protocols like BGP where needed
Integrate high-performance distributed storage with compute and networking layers
Preferred
Go or Rust experience (3+ years)
Deep knowledge of HPC fabrics (InfiniBand, Ultra Ethernet, RoCEv2)
Experience with high-performance storage (WEKA, VAST, Ceph, etc.)
Prior exposure to global distributed compute operations
Benefits
GENEROUS EQUITY GRANT
VISA SPONSORSHIPS
RETIREMENT MATCHING
MEDICAL, DENTAL & VISION
TIME OFF
PARENTAL LEAVE
DAILY LUNCH
UNLIMITED OFFICE BOOK BUDGET
Company
San Francisco Compute Company
Compute is a commodity. We think people should buy it like one.
Funding
Current Stage
Early StageTotal Funding
$52MKey Investors
Altman Capital
2025-11-26Series A· $40M
2024-07-16Series Unknown· $12M
Company data provided by crunchbase