Aldea · 2 months ago
Research Engineer (Machine Learning)
Aldea is a multi-modal foundational AI company focused on advancing the scaling laws of intelligence. The Research Engineer (Machine Learning) will build and optimize the infrastructure for multi-modal AI research, enabling the team to experiment with next-generation architectures in language and speech domains.
Artificial Intelligence (AI)SoftwareSpeech Recognition
Responsibilities
Build and maintain distributed training infrastructure supporting researchers across language and speech domains at a billion-plus-parameter scale
Optimize training and inference performance across the stack, delivering significant speedups through framework optimization, custom kernels, and system-level improvements
Design experiment infrastructure including automated evaluation pipelines, experiment tracking, and monitoring systems that enable rapid iteration
Scale infrastructure from single-node to multi-node distributed training and deploy production inference systems for real-time applications
Support researchers with fast turnaround on infrastructure issues and maintain high reliability across all systems
Collaborate with research scientists, data engineers, and leadership to define technical priorities and infrastructure roadmap
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
3+ years of experience with PyTorch and distributed training frameworks (DDP, FSDP, DeepSpeed, or similar)
Experience training large-scale deep learning models at 1B+ parameters
Deep understanding of training optimization techniques including mixed precision, gradient checkpointing, and memory management
Proven ability to build production-grade ML infrastructure with high reliability
Track record of delivering significant performance optimizations in ML training or inference systems
Preferred
Experience with custom kernel development (CUDA, Triton) or GPU optimization
Hands-on experience with large-scale pretraining (100B+ tokens, ideally trillion+ scale)
Experience optimizing inference for production: quantization, vLLM, TensorRT, or custom serving engines
Familiarity with speech/audio ML systems and real-time inference constraints
Experience building automated evaluation frameworks and experiment tracking systems
Knowledge of profiling tools and multi-node training across 8-32+ GPUs
Exposure to job orchestration systems (SLURM, Kubernetes, Ray)
Master's or PhD in Computer Science, Machine Learning, or related field
Benefits
Competitive base salary
Performance-based bonus aligned with research and model milestones
Equity participation
Comprehensive health, dental, and vision coverage
Flexible paid time off
Company
Aldea
Aldea builds AI voice and language technology with speech-to-text, text-to-speech, and conversational interfaces.
Funding
Current Stage
Early StageCompany data provided by crunchbase