Mirage · 1 month ago
Member of Technical Staff, Training Data Infrastructure
Mirage is the leading AI short-form video company, building full-stack foundation models and products for video creation. The role involves developing training data infrastructure and optimizing data processing systems for multimodal datasets, directly impacting the company's ability to train models for millions of users.
Artificial Intelligence (AI)Generative AISoftware
Responsibilities
Build performant pipelines for processing video and multimodal training data at scale
Design distributed systems that scale seamlessly with our rapidly growing video and multimodal datasets
Create efficient data loading systems optimized for GPU training throughput
Implement comprehensive telemetry for video processing and training pipelines
Create foundation data processing systems that intelligently cache and reuse expensive computations across the training pipeline
Build robust data validation and quality measurement systems for video and multimodal content
Design systems for data versioning and reproducing complex multimodal training runs
Develop efficient storage and compute patterns for high-dimensional data and learned representations
Own and improve end-to-end training pipeline performance
Build systems for efficient storage and retrieval of video training data
Build frameworks for systematic data and model quality improvement
Develop infrastructure supporting fast research iteration cycles
Build tools and systems for deep understanding of our training data characteristics
Build infrastructure enabling rapid testing of research hypotheses
Create systems for incorporating user feedback into training workflows
Design measurement frameworks that connect model improvements to user outcomes
Enable systematic experimentation with direct user feedback loops
Qualification
Required
Bachelor's or Master's degree in Computer Science, Machine Learning, or related field
3+ years experience in ML infrastructure development or large-scale data engineering
Strong programming skills, particularly in Python and distributed computing frameworks
Expertise in building and optimizing high-throughput data pipelines
Proven experience with video/image data pre-processing and feature engineering
Deep knowledge of machine learning workflows, including model training and data loading systems
Track record in performance optimization and system scaling
Experience with cluster management and distributed computing
Background in MLOps and infrastructure monitoring
Demonstrated ability to build reliable, large-scale data processing systems
Love tackling hard technical problems head-on
Take ownership while knowing when to loop in teammates
Get excited about improving system performance
Want to work directly with researchers and engineers who are equally passionate about building great systems
Benefits
Comprehensive medical, dental, and vision plans
401K with employer match
Commuter Benefits
Catered lunch multiple days per week
Dinner stipend every night if you're working late and want a bite!
Grubhub subscription
Health & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)
Multiple team offsites per year with team events every month
Generous PTO policy
Company
Mirage
Mirage is building foundation models and products that change the future of video.
Funding
Current Stage
Growth StageTotal Funding
$100MKey Investors
Index VenturesKleiner Perkins
2024-07-09Series C· $60M
2023-06-22Series B· $25M
2022-04-08Series A· $11M
Recent News
Business Insider
2025-10-09
2025-09-12
Company data provided by crunchbase