Poolside · 19 hours ago
Member of Engineering (Pre-training / Synthetic Data)
Poolside is a company focused on building a world where AI drives economically valuable work and scientific progress. They are seeking a Member of Engineering to work on their data team, improving the quality of pretraining datasets and generating synthetic data at scale. The role involves collaboration with various teams to define data needs and ensure high-quality datasets for training large models.
AI InfrastructureArtificial Intelligence (AI)Developer PlatformFoundational AIInformation TechnologySoftware
Responsibilities
Follow the latest research related to LLMs and synthetic data generation in particular. Be familiar with the most relevant open-source datasets and models
Design and implement complex pipelines that can generate large amounts of data while maintaining high diversity and optimizing the resources available
Closely work with other teams such as Pretraining, Posttraining, Evals and Product to ensure alignment on the quality of the models delivered
Continuously measure and refine the quality of the datasets being generated while validating the final data strategy through quantitative data ablation experiments
Qualification
Required
Strong machine learning and engineering background
Experience with Large Language Models (LLM), including: Understanding of how LLMs learn, Data ablations and scaling laws, Post-training techniques, Training reasoning and agentic models
Experience with implementing cost-efficient, complex pipelines to generate synthetical datasets at scale optimizing for data quality, correctness, diversity, etc
Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc)
Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc
Excellent programming skills in Python
Strong prompt engineering skills
Experience working with large-scale GPU clusters and distributed data pipelines
Strong obsession with data quality
Preferred
Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have
Can freely discuss the latest papers and descend to fine details
Is reasonably opinionated
Benefits
Fully remote work & flexible hours
37 days/year of vacation & holidays
Health insurance allowance for you and dependents
Company-provided equipment
Wellbeing, always-be-learning and home office allowances
Frequent team get togethers
Great diverse & inclusive people-first culture
Company
Poolside
Poolside is an artificial intelligence platform that offers foundation concepts and infrastructure to write software codes.
H1B Sponsorship
Poolside has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Growth StageTotal Funding
$626MKey Investors
Bain Capital VenturesRedpoint
2024-10-02Series B· $500M
2023-08-24Series A· $100M
2023-05-14Seed· $26M
Recent News
2025-12-17
2025-11-13
2025-11-11
Company data provided by crunchbase