Senior Backend Engineer, Data Modeling and Ingestion Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

Udio · 3 weeks ago

Senior Backend Engineer, Data Modeling and Ingestion Platform

Udio is seeking a Senior Backend Engineer to lead the unification of large datasets that power their generative audio models. The role involves building scalable systems for linking and enriching data, collaborating closely with ML researchers and product teams.

Artificial Intelligence (AI)Information TechnologyInternet

Responsibilities

Build high-throughput bulk ingestion workflows to integrate datasets from multiple external providers
Design and implement scalable entity-resolution solutions, including record linking, deduplication, clustering, and conflict arbitration
Create and refine matching logic, decision rules, and similarity functions to align datasets with high accuracy and strong coverage
Define and track data quality indicators, such as overlap metrics, match precision/recall, duplicate rates, and completeness
Prepare training-ready datasets in formats such as TFRecords, and structure data to meet ML research requirements
Develop processing components using Dataflow (Beam) and manage large analytical workloads in BigQuery
Leverage frameworks like Ray to accelerate large-scale experiments, feature extraction, and research-oriented data preparation
Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge

Qualification

Entity resolutionData unificationPythonBigQueryGoogle Dataflow/Apache BeamData validationTFRecordsClear communicationCollaborationProblem-solving

Required

Experience working with large, heterogeneous datasets from multiple providers or domains
Strong background in entity resolution, deduplication, data unification, or related large-scale data integration techniques
Proficiency in Python, with an emphasis on efficient, scalable data processing
Experience with BigQuery, Google Dataflow/Apache Beam, or similar batch-processing frameworks
Familiarity with data validation, normalization, reconciliation, and building consistent views across diverse data sources
Ability to craft well-structured matching and decision strategies that balance accuracy, completeness, and computational efficiency
Comfortable iterating quickly on pragmatic solutions, balancing correctness with time-to-delivery
Clear communication skills and the ability to collaborate closely with ML and research teams

Preferred

Knowledge of architecting Google Cloud Platform systems at scale
Experience with distributed compute frameworks such as Ray, Spark, or Flink
Understanding of JAX-based ML pipelines, multihost training setups, or large-scale data preparation for accelerator-backed workflows
Familiarity with TFRecords or other high-volume training data formats
Exposure to ranking, clustering, or statistical similarity modeling
Experience with Go, NextJS, and/or React Native to contribute to full-stack development

Benefits

Highly competitive salary and equity
Quarterly productivity budget
Flexible time off
Fantastic office location in Manhattan
Productivity package, including ChatGPT Plus, Claude Code, and Copilot
Top notch private health, dental, and vision insurance for you and your dependents
401(k) plan options with employer matching
Concierge medical/primary care through One Medical and Rightway
Mental health support from Spring Health
Personalized life insurance, travel assistance, and many other perks

Company

Udio

twittertwitter
company-logo
Udio is an AI-powered music creation app that offers a platform for instant music creation.

Funding

Current Stage
Early Stage
Total Funding
$10M
2024-04-10Seed· $10M
Company data provided by crunchbase