DeepRec.ai · 3 days ago

Senior/Staff Machine Learning Engineer

United States

Full-time

Remote

Senior Level, Lead/Staff

$300K/yr - $400K/yr

Maximize your interview chances

Staffing & Recruiting

Hiring Manager

Hayley Killengrey

Insider Connection @DeepRec.ai

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Productionize Advanced ML Frameworks: Work closely with researchers to develop, test, and deploy parallelization and verification frameworks optimized for high-performance training and inference.

Convert Research into Production Code: Translate novel hybrid parallelization and verification methods from research concepts into production-grade code ready for real-world applications.

Optimize ML Systems at Scale: Implement and refine frameworks that support highly scalable training (e.g., FSDP, Megatron-LM, DeepSpeed) and production-scale inference (e.g., ONNX Runtime, TensorRT, NVIDIA Triton).

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Production ML ParallelizationDeep LearningDistributed SystemsNetworking ProtocolsCommunication ProtocolsResearch-Driven MindsetHigh-Growth Startup Experience

Required

Ability to operate in a research-heavy environment, making strategic trade-offs and working with ambiguity as you drive high-impact projects to completion.

Proven experience with parallelization frameworks for both training (e.g., FSDP, Megatron-LM, DeepSpeed) and inference (e.g., ONNX Runtime, DeepSpeed-Inference, NVIDIA Triton).

Strong foundation in either deep learning or distributed systems, enabling you to develop and optimize complex ML architectures.

Preferred

Background in fast-paced, high-growth environments, with a demonstrated ability to navigate rapid changes.

Proficiency in core networking protocols (e.g., IP, TCP, UDP, HTTP) and communication backends (e.g., NCCL, GLOO, MPI) essential for optimizing distributed ML systems