Bespoke Labs · 6 hours ago
Senior Data Engineer
BespokeLabs is a premier, VC-backed AI Research lab with an exceptionally talent-dense team of IIT and Ivy League alumni. They are seeking a top-tier Senior/Staff Data Engineer for a high-impact, 2-month sprint to architect and build complex curation systems required for advanced AI model training.
Computer Software
Responsibilities
Architect AI-Scale Systems: Design the overarching data architecture and processing topology needed to programmatically curate and shape datasets at TB/PB scale, ensuring low latency and high consistency
Hands-On Development: Write production-grade code (Python/Scala, Spark) to build custom ingestion logic, highly efficient transformation scripts, and performant data validation checks
Complex Data Logic: Implement advanced filtering, deduplication, and quality-scoring algorithms at scale, ensuring the resulting data objects are optimized for LLM/ML consumption
Quality & Performance Tuning: Rigorously test, benchmark, and optimize processing workloads (CPU/memory tuning, partitioning strategies in Spark/Iceberg) to meet aggressive throughput targets
Domain Subject Matter Expert: Act as the ultimate technical authority on distributed systems, data processing, and cloud structures to ensure the training data factory meets enterprise-grade accuracy
Qualification
Required
Experience: 6+ years of Data Engineering experience
Seniority: Demonstrated Senior/Staff-level ownership of production data platforms
Pedigree: Background at Tier-1 enterprises (FAANG, large SaaS, Fortune 100)
Technical Stack: Deep fluency in Python/Scala, Spark, Kafka, Airflow, and Major Cloud Warehouses (Snowflake, BigQuery, Redshift)
Company
Bespoke Labs
RL for Agents
Funding
Current Stage
Early StageCompany data provided by crunchbase