Roche · 8 hours ago
2026 Summer Intern - ML modeling of DNA sequencing error, Roche Diagnostics
Roche is a global healthcare company dedicated to advancing science and ensuring access to healthcare. The internship focuses on developing machine learning models for DNA sequencing error, providing hands-on experience in computational biology and data science within a collaborative environment.
BiotechnologyHealth CareHealth DiagnosticsOncologyPharmaceuticalPrecision Medicine
Responsibilities
Get familiar with the sequencing Simulation pipeline: architecture, data flow, interfaces, and evaluation metrics
Reproduce baseline runs and document the setup for reproducibility (env, data versions, configs)
Define targets (e.g., per-base/read-level error probabilities) and assemble training/validation datasets
Perform feature engineering, sanity checks, and data quality assessments; establish data splits and leakage controls
Implement baseline models: gradient-boosted decision trees (e.g., XGBoost/LightGBM/CatBoost) and neural network regression for probability vector prediction
Train, tune, and validate models using robust protocols (cross-validation, early stopping, hyperparameter search)
Assess performance with appropriate metrics (e.g., Brier score, log-loss, RMSE; calibration curves and reliability diagrams; ROC/PR if framed as classification)
Analyze model behavior: feature importance, error stratification, ablation studies, and basic uncertainty estimates
Integrate the best-performing model(s) into the Simulation pipeline
Profile and, if needed, improve pipeline efficiency (I/O, batching, parallelization); ensure reproducible workflows (containers, versioning)
Maintain clear experiment logs, notebooks, and code documentation
Share progress updates; prepare a concise final report and presentation
Draft a structured analysis write-up that could potentially serve as the basis for a future publication (post-internship)
Explore feasibility of sequence-aware architectures (e.g., transformer-based models) for error prediction and document findings for future work
Qualification
Required
Must be pursuing a Master's or PhD Degree
Required Majors: Computer Science, Physics, Applied mathematics/Engineering, Biology, Chemistry (or closely related engineering/science fields)
Working knowledge of Probability, Statistics and Machine Learning fundamentals
Solid understanding of Linear Algebra and Programming Methodology
Proficiency in Python and at least one ML framework (PyTorch or TensorFlow)
Strong data structures and algorithms fundamentals; ability to write clean and efficient code
Comfort with Linux command line and basic shell scripting
Preferred
Biology/Chemistry background is a plus
Excellent communication, collaboration, and interpersonal skills
Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion
Benefits
Paid holiday time off benefits
Company
Roche
Roche is a pharmaceutical and diagnostics company that offers medicines and diagnostic tests for various medical conditions and diseases.
H1B Sponsorship
Roche has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (9)
2023 (6)
2022 (2)
2021 (2)
Funding
Current Stage
Public CompanyTotal Funding
$7.79BKey Investors
SoftBankSCALE AINovartis
2021-08-04Post Ipo Equity· $5B
2020-12-07IPO
2020-05-06Post Ipo Equity· $0.5M
Leadership Team
Recent News
2026-02-03
2026-02-03
2026-02-03
Company data provided by crunchbase