AI Researcher – Multilingual Data jobs in United States
cer-icon
Apply on Employer Site
company-logo

Featherless AI · 17 hours ago

AI Researcher – Multilingual Data

Featherlessai is seeking an AI Researcher focused on multilingual data to build and scale next-generation language models. The role involves designing research strategies for multilingual datasets, collaborating on training pipelines, and publishing high-quality research.

Artificial Intelligence (AI)Cloud ComputingDatabase
check
H1B Sponsor Likelynote

Responsibilities

Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement
Develop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)
Research and improve cross-lingual transfer, alignment, and robustness in large language models
Build and maintain evaluation benchmarks for multilingual performance
Collaborate with engineers and researchers on training pipelines and model architecture decisions
Publish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriate
Translate research insights into practical improvements in production models

Qualification

NLP researchMultilingual modelingLarge-scale datasetsPythonTransfer learningOpen-source contributionsData quality metricsPrototyping

Required

Strong background in NLP / ML research, with a focus on multilingual or cross-lingual modeling
Publication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)
Experience working with large-scale text datasets across multiple languages
Solid understanding of: Tokenization and vocabulary design for multilingual models, Data quality metrics, filtering, and dataset bias, Transfer learning and multilingual representation learning
Comfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)
Ability to operate independently and ship research in a startup pace environment

Preferred

Experience with low-resource languages or non-Latin scripts
Open-source contributions in NLP or data tooling
Experience training or evaluating large language models
Familiarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)

Benefits

Competitive compensation
Meaningful equity at an early stage

Company

Featherless AI

twittertwittertwitter
company-logo
We enable serverless inference via our GPU orchestration and model load-balancing system.

H1B Sponsorship

Featherless AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$5M
Key Investors
Airbus Ventures
2025-10-31Series A
2025-03-17Seed· $5M
Company data provided by crunchbase