Apply on Employer Site

San Francisco Compute Company · 2 months ago

Principal Software Engineer - Networking

San Francisco, CA

Full-time

Onsite

Lead/Staff

$175K/yr - $300K/yr

SF Compute is focused on revolutionizing the commodity compute market by creating a trading venue for compute contracts. The Principal Software Engineer - Networking will design and operate infrastructure for GPU clusters, emphasizing system software and distributed automation to ensure seamless integration of networking and compute at scale.

Information TechnologyInternet

H1B Sponsored

Responsibilities

Design and operate orchestration frameworks to manage tens of thousands of GPUs across Kubernetes, virtualization, and bare metal

Develop automation frameworks for large-scale provisioning, monitoring, and fault tolerance

Build distributed systems that can withstand node or cluster-wide failures

Architect software-defined networking solutions that integrate with underlay switches and support scalable designs

Collaborate with networking specialists to ensure fabric resilience, low latency, and scalability, leveraging routing protocols like BGP where needed

Integrate high-performance distributed storage with compute and networking layers

Qualification

Distributed systemsSoftware-defined networkingAutomation frameworksLinux internalsNetworking protocolsGPU/HPC clustersScripting skillsGoRustHigh-performance storageDocumentation skills

Required

Strong software engineering background, with experience building fault-tolerant distributed systems

Comfortable with Linux internals, debugging, and performance optimization

Exposure to GPU/HPC clusters