MeshyAI ยท 1 day ago
Machine Learning System Engineer
MeshyAI is a leading 3D generative AI company headquartered in Silicon Valley, focused on transforming the content creation pipeline for 3D assets. They are seeking Machine Learning Systems Engineers to build end-to-end machine learning systems dedicated to 3D, involving tasks from pretraining to deployment.
Computer Software
Responsibilities
Work within the AI model team to streamline 3D data into high-throughput pipelines and scale training infrastructure to hundreds of GPUs
Train, accelerate, and deploy machine learning models for 3D GenAI
Design and implement reliable and scalable distributed training pipelines, optimize end-to-end training efficiency
Work closely with researchers, software engineers, and artists to integrate AI models into production
Work closely with researchers to build the training infrastructure for our in-house foundational models
Identifying bottlenecks and optimizing for high throughput & efficient distributed model training across hundreds to thousands of GPUs
Building and maintaining training clusters and job schedulers
Implementing and maintaining 3D specific custom operators in Triton or CUDA
Building efficient inference endpoints with complex model pipelines
Optimizing models through compilation, fusion, quantization, etc
Qualification
Required
Experience in machine learning or high performance graphics
Solid practical understanding of at least one machine learning framework (e.g. PyTorch, Flax)
Strong ability to write beautiful and maintainable code in Python and/or C++
Ability to learn fast and dive into new concepts or complex codebases
Performance and efficiency oriented mindset, with a strong interest in the tiniest detail
Strong communication skills for working in a globally distributed team
Preferred
A strong passion to navigate through the PyTorch internals, with hands-on experience in areas like torch.compile, fully_shard (FSDP2) APIs
Experience with building Triton kernels
Experiences with large-scale distributed training, familiarity with modern parallelization techniques: DP, TP, CP, PP, zero redundancy optimizers, etc
Experience with diffusion models in 3D or video
Experience with full bf16 or partially fp8 training