Staff Machine Learning Engineer, ML Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

Reddit, Inc. · 22 hours ago

Staff Machine Learning Engineer, ML Platform

Reddit is a community of communities and one of the internet’s largest sources of information. The Staff ML Infrastructure Engineer will lead the development of a platform for large scale ML models, focusing on enhancing model lifecycle management and optimizing ML infrastructure.

ContentNewsSocial MediaSocial Network
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more
Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration
Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment
Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more
Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges

Qualification

ML infrastructureMLOpsCloud technologiesDistributed training frameworksPythonGraph databasesGraph neural networksCommunication skillsOrganizational skills

Required

7+ years of experience in ML infrastructure, including model training and model deployments
Hands-on experience with ML optimization, including memory and GPU profiling
Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more
Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc
Deep experience working with distributed training frameworks, including Ray and Kubernetes
Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle
Strong organizational & communication skills

Preferred

Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus

Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Company

Reddit, Inc.

company-logo
Reddit is the heart of the internet, where millions of people get together to talk about any topic imaginable.

H1B Sponsorship

Reddit, Inc. has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (99)
2024 (63)
2023 (76)
2022 (70)
2021 (68)
2020 (39)

Funding

Current Stage
Public Company
Total Funding
$1.33B
Key Investors
FidelityVy CapitalTencent
2024-03-21IPO
2021-08-12Series F· $410M
2021-02-08Series E· $367.95M

Leadership Team

leader-logo
Steve Huffman
CEO
linkedin
leader-logo
Chris Slowe
CTO
linkedin
Company data provided by crunchbase