Data Scientist jobs in United States
cer-icon
Apply on Employer Site
company-logo

Bespoke Labs · 13 hours ago

Data Scientist

BespokeLabs is a premier, VC-backed AI Research lab founded by IIT and Ivy League alumni. They are seeking a high-impact Senior Data Scientist for a 2-month contract to leverage expertise in machine learning and applied statistics to develop algorithms for curating datasets for AI model training.

Computer Software

Responsibilities

Algorithm Design: Design and implement custom statistical models and programmatic logic (e.g., anomaly detection, active learning, similarity scoring) to evaluate data quality, complexity, and redundancy at scale
Hands-on At-Scale Coding: Write scalable PySpark and Python (NumPy/Pandas) code to apply these algorithms across massive datasets, translating experimental logic into reliable, large-scale workflows
Metric Formulation: Develop custom quantitative metrics and heuristic benchmarks to rigorously assess the fidelity and suitability of data subsets for specific AI training objectives
Validation & Iteration: Run high-speed validation cycles, analyzing the output of data-curation algorithms to diagnose skew, bias, or noise, and iteratively refining the logic
High-Level Curation: Apply Senior-level domain expertise in predictive modeling and feature engineering to ensure the final training inputs meet the strict standards required for state-of-the-art ML systems

Qualification

Machine LearningApplied StatisticsPythonApache SparkFeature EngineeringModel EvaluationStatistical ModelingSQLExperimentationCollaboration

Required

Experience: 6+ years as a Data Scientist or Applied Scientist
Production Background: Proven ownership of models running in production environments
Applied Statistics: Strong background in applied statistics and experimentation frameworks
Languages: Python (NumPy, Pandas, Scikit-learn, PyTorch / TensorFlow) and Strong SQL
Big Data: Apache Spark (PySpark or Spark SQL) for large-scale data processing
Methodologies: Feature engineering, model evaluation, statistical modeling, and hypothesis testing

Preferred

Scale: Models trained on TB-scale datasets
Domain Specificity: Experience in high-complexity domains such as: Recommendations, Pricing, Fraud / risk, Search / ranking, or Growth & experimentation
Collaboration: Experience deploying models alongside data engineering pipelines

Company

Bespoke Labs

twitter
company-logo
RL for Agents

Funding

Current Stage
Early Stage
Company data provided by crunchbase