Apply on Employer Site

Zettabyte · 7 hours ago

Senior/Staff Backend Engineer - Distributed System

Palo Alto, CA

Full-time

Hybrid

Senior Level

5+ years exp

Zettabyte is on a mission to make AI compute ubiquitous, seamless, and limitless. They are seeking a Backend Engineer to build systems that orchestrate GPU clusters for AI workloads, creating APIs and resource management systems that directly impact the efficiency of AI infrastructure.

Artificial Intelligence (AI)Cloud ComputingSoftware

No H1B

Responsibilities

Design APIs that abstract complex GPU operations into simple developer experiences

Build scheduling algorithms that maximize GPU utilization while ensuring SLA compliance

Develop resource management systems for GPU lifecycle—provisioning, allocation, scheduling, and release

Create usage tracking and billing systems for GPU-hours, memory usage, and compute utilization

Implement monitoring for GPU-specific metrics, health checks, and automatic failure recovery

Build multi-tenancy systems with resource isolation, quota management, and fair scheduling

Optimize cold starts for model serving and implement efficient model loading strategies

Collaborate with frontend engineers to expose complex infrastructure through intuitive interfaces

Leverage AI-assisted coding tools (GitHub Copilot, Claude Code, Cursor IDE, etc.) to boost productivity and code quality

Qualification

Backend engineeringDistributed systemsGoPythonAPI designResource schedulingLinux systemsContainerizationGPU managementPerformance optimizationStartup mindset