Distributed Compute Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Magic · 1 year ago

Distributed Compute Engineer

Magic is working on frontier-scale code models to build a coworker, not just a copilot. As a Distributed Compute Engineer at Magic, you will be responsible for building the stack and systems that enable large-scale AI model training and inference on GPU clusters.

Artificial Intelligence (AI)Information TechnologyMachine Learning
check
H1B Sponsorednote

Responsibilities

Develop and maintain the software stack to support large-scale, highly available AI training and inference infrastructure
Implement and optimize systems for data processing and inference using technologies like Ray, Redis, Message Queues (Kafka), distributed communication libraries (gRPC, ZeroMQ) and HPC technologies
Orchestrate fine-grained data movement using Rust, C++ and NCCL or UCX
Design and manage high-performance storage and caching solutions to support data-intensive applications
Build with an eye towards fault-tolerance, performance and observability
Hack on the internals of deep learning frameworks (PyTorch, Jax) in a distributed setting
Troubleshoot and resolve complex issues across GPU resources, networking, OS, drivers, and cloud environments. Automate fault detection and recovery processes

Qualification

Distributed systems designCloud platformsData-intensive systemsHigh-availability systemsDeep learning frameworks

Required

Deep knowledge of distributed systems design and cloud platforms (AWS, GCP, Azure)
Extensive experience designing and operating high-availability, data-intensive systems
Specific experience in operating large-scale storage or networking solutions
Experience with the internals or operation of distributed DBMS (Clickhouse, Snowflake, BigQuery, vector DBs), batch and stream processing (Spark, Flink), file/storage systems (RocksDB, Lustre/NFS), and distributed ML systems (Deepspeed, torch.distributed, Ray, Dask) or HPC workloads
Exceptional problem-solving skills across complex infrastructure up and down the stack

Benefits

Equity compensation
401(k) plan with 6% salary matching
Generous health, dental and vision insurance
Unlimited paid time off
Visa sponsorship and relocation stipend

Company

Magic is an AI coding startup that enables developers to work with AI to find code for building apps.

H1B Sponsorship

Magic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)

Funding

Current Stage
Growth Stage
Total Funding
$465.93M
Key Investors
Flat CapitalNFDG VenturesCapitalG
2025-01-14Series Unknown· $0.81M
2024-08-29Series Unknown· $320M
2024-02-16Series B· $117M
Company data provided by crunchbase