DeepRec.ai · 22 hours ago
Staff Machine Learning Engineer
Maximize your interview chances
Insider Connection @DeepRec.ai
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Productionize sophisticated machine learning parallelization and verification frameworks for real-world deployment.
Transform novel research in hybrid parallelization and verification into robust, production-ready code.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Ability to thrive in a research-driven environment, adeptly navigating the uncertainties and trade-offs that arise.
Demonstrated experience with production environments utilizing advanced parallelization frameworks (e.g., FSDP, Megatron-LM, DeepSpeed).
Strong theoretical foundation in either deep learning or distributed systems.
Preferred
Proven experience in high-growth startup or scale-up settings.
In-depth knowledge of networking protocols (IP, TCP, UDP, HTTP) and communication backends (NCCL, GLOO, MPI).
Exposure to compiler design and development.
Strong systems programming skills, especially with Rust.