Magic · 1 year ago
Distributed Compute Engineer
Magic is working on frontier-scale code models to build a coworker, not just a copilot. As a Distributed Compute Engineer at Magic, you will be responsible for building the stack and systems that enable large-scale AI model training and inference on GPU clusters.
Artificial Intelligence (AI)Information TechnologyMachine Learning
Responsibilities
Develop and maintain the software stack to support large-scale, highly available AI training and inference infrastructure
Implement and optimize systems for data processing and inference using technologies like Ray, Redis, Message Queues (Kafka), distributed communication libraries (gRPC, ZeroMQ) and HPC technologies
Orchestrate fine-grained data movement using Rust, C++ and NCCL or UCX
Design and manage high-performance storage and caching solutions to support data-intensive applications
Build with an eye towards fault-tolerance, performance and observability
Hack on the internals of deep learning frameworks (PyTorch, Jax) in a distributed setting
Troubleshoot and resolve complex issues across GPU resources, networking, OS, drivers, and cloud environments. Automate fault detection and recovery processes
Qualification
Required
Deep knowledge of distributed systems design and cloud platforms (AWS, GCP, Azure)
Extensive experience designing and operating high-availability, data-intensive systems
Specific experience in operating large-scale storage or networking solutions
Experience with the internals or operation of distributed DBMS (Clickhouse, Snowflake, BigQuery, vector DBs), batch and stream processing (Spark, Flink), file/storage systems (RocksDB, Lustre/NFS), and distributed ML systems (Deepspeed, torch.distributed, Ray, Dask) or HPC workloads
Exceptional problem-solving skills across complex infrastructure up and down the stack
Benefits
Equity compensation
401(k) plan with 6% salary matching
Generous health, dental and vision insurance
Unlimited paid time off
Visa sponsorship and relocation stipend
Company
Magic
Magic is an AI coding startup that enables developers to work with AI to find code for building apps.
H1B Sponsorship
Magic has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
Funding
Current Stage
Growth StageTotal Funding
$465.93MKey Investors
Flat CapitalNFDG VenturesCapitalG
2025-01-14Series Unknown· $0.81M
2024-08-29Series Unknown· $320M
2024-02-16Series B· $117M
Recent News
2025-11-13
TechWire Asia
2025-09-12
2024-11-01
Company data provided by crunchbase