Machine Learning Engineer — Multilingual Data jobs in United States
cer-icon
Apply on Employer Site
company-logo

Featherless AI · 1 day ago

Machine Learning Engineer — Multilingual Data

FeatherlessAI is seeking a Machine Learning Engineer to own and scale their multilingual data pipeline. The role involves designing and maintaining large-scale multilingual datasets, developing data pipelines, and ensuring model performance across various languages and cultural contexts.

Artificial Intelligence (AI)Cloud ComputingDatabase
check
H1B Sponsor Likelynote

Responsibilities

Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
Implement quality filters using statistical, heuristic, and model-based methods
Work with researchers to define language coverage, benchmarks, and evaluation metrics
Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
Support training, fine-tuning, and distillation workflows with high-quality multilingual data
Continuously iterate on datasets based on model performance and real-world usage

Qualification

Multilingual datasetsData pipelinesNLP fundamentalsPythonStatistical methodsLinguistics backgroundCollaboration

Required

3+ years of experience as an ML Engineer, Applied Scientist, or similar role
Strong experience working with multilingual or non-English datasets
Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
Experience building scalable data pipelines (Python, Spark, Ray, or similar)
Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
Comfort collaborating with researchers and translating research needs into production systems

Preferred

Experience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)
Exposure to LLM training, fine-tuning, or distillation
Linguistics background or experience working with native language experts
Contributions to open-source datasets or ML tooling
Experience with data quality evaluation at scale

Benefits

Competitive compensation + meaningful equity at Series A stage

Company

Featherless AI

twittertwittertwitter
company-logo
We enable serverless inference via our GPU orchestration and model load-balancing system.

H1B Sponsorship

Featherless AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$5M
Key Investors
Airbus Ventures
2025-10-31Series A
2025-03-17Seed· $5M
Company data provided by crunchbase