Research Scientist, Post-Training jobs in United States
cer-icon
Apply on Employer Site
company-logo

DatologyAI · 5 months ago

Research Scientist, Post-Training

DatologyAI is a company focused on optimizing data for model training. They are seeking a Research Scientist to lead post-training data curation efforts, design algorithms for data improvement, and bridge the gap between pre-training and post-training data curation.

Artificial Intelligence (AI)Data CenterData IntegrationDatabaseInformation Technology
check
H1B Sponsor Likelynote

Responsibilities

Post-training data curation. You’ll conduct research on how to algorithmically curate post-training data—e.g., how to generate and refine preference and instruction-following data, how to curate capability- and domain-specific data, and make post-training more effective, controllable, and generalizable
Unifying pre-training and post-training data curation. Pushing the bounds on model capabilities requires unifying post-training and pre-training data curation. You will pursue research on end-to-end data curation: how to curate pre-training data to improve the post-trainability of models and how to jointly optimize pre- and post-training data curation, all in service of maximizing the final performance of post-trained models
Transform messy literature into practical improvements. The research literature is vast, rife with ambiguity, and constantly evolving. You will use your skills as a scientist to source, vet, implement, and improve promising ideas from the literature and of your own creation
Conduct science driven by real-world needs. At DatologyAI, we understand that conference reviewers and academic benchmarks don’t always incentivize the most impactful research. Your research will be guided by concrete customer needs and product improvements
Nobody knows how to do your work better than you. We believe that scientists do their best work when they have the autonomy to pursue problems in the manner they prefer, and we will ensure that you are equipped with the context and resources you need to succeed
Science is more than just experiments. We expect our Research Scientists to collaborate closely with engineers, talk to customers, and shape the product vision

Qualification

Deep learning researchPost-training algorithm developmentData curationSoftware engineeringPyTorchPreference-based tuningSelf-supervision techniquesDistributed data processingCollaboration skillsCommunication skills

Required

3+ years of deep learning research experience
Experience with post-training large vision, language, and multimodal models
Post-training algorithm development, data curation, and/or synthetic data methods for: Preference-based tuning (e.g. DPO, RLVR, RRHF), Alternative supervision & self-supervision techniques such as self-training and chain-of-thought distillation, SFT (e.g. instruction tuning and demonstration fine-tuning)
Post-training tooling development and engineering experience
Strong understanding of the fundamentals of deep learning
Sufficient software engineering + deep learning framework (PyTorch or a willingness to learn PyTorch) skills to conduct large-scale research experiments and build production prototypes
Demonstrated track record of success in deep learning research, whether papers, tools, or other research artifacts

Preferred

Experience with data management and distributed data processing solutions (e.g. Spark, Snowflake, etc.)
Experience building + shipping ML products

Benefits

100% covered health benefits (medical, vision, and dental)
401(k) plan with a generous 4% company match
Unlimited PTO policy
Annual $2,000 wellness stipend
Annual $1,000 learning and development stipend
Daily lunches and snacks are provided in our office
Relocation assistance for employees moving to the Bay Area

Company

DatologyAI

twittertwittertwitter
company-logo
DatologyAI is an AI-data curation startup that develops deep learning tools for automatic selection in data training.

H1B Sponsorship

DatologyAI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (2)

Funding

Current Stage
Early Stage
Total Funding
$57.65M
Key Investors
FelicisAmplify Partners
2024-05-08Series A· $46M
2024-02-22Seed· $11.65M

Leadership Team

leader-logo
Ari Morcos
CEO and Co-Founder
linkedin
leader-logo
Bogdan Gaza
Co-Founder & CTO
linkedin

Recent News

Company data provided by crunchbase