2026 Summer Intern - ML modeling of DNA sequencing error, Roche Diagnostics jobs in United States
cer-icon
Apply on Employer Site
company-logo

Roche · 8 hours ago

2026 Summer Intern - ML modeling of DNA sequencing error, Roche Diagnostics

Roche is a global healthcare company dedicated to advancing science and ensuring access to healthcare. The internship focuses on developing machine learning models for DNA sequencing error, providing hands-on experience in computational biology and data science within a collaborative environment.

BiotechnologyHealth CareHealth DiagnosticsOncologyPharmaceuticalPrecision Medicine
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Get familiar with the sequencing Simulation pipeline: architecture, data flow, interfaces, and evaluation metrics
Reproduce baseline runs and document the setup for reproducibility (env, data versions, configs)
Define targets (e.g., per-base/read-level error probabilities) and assemble training/validation datasets
Perform feature engineering, sanity checks, and data quality assessments; establish data splits and leakage controls
Implement baseline models: gradient-boosted decision trees (e.g., XGBoost/LightGBM/CatBoost) and neural network regression for probability vector prediction
Train, tune, and validate models using robust protocols (cross-validation, early stopping, hyperparameter search)
Assess performance with appropriate metrics (e.g., Brier score, log-loss, RMSE; calibration curves and reliability diagrams; ROC/PR if framed as classification)
Analyze model behavior: feature importance, error stratification, ablation studies, and basic uncertainty estimates
Integrate the best-performing model(s) into the Simulation pipeline
Profile and, if needed, improve pipeline efficiency (I/O, batching, parallelization); ensure reproducible workflows (containers, versioning)
Maintain clear experiment logs, notebooks, and code documentation
Share progress updates; prepare a concise final report and presentation
Draft a structured analysis write-up that could potentially serve as the basis for a future publication (post-internship)
Explore feasibility of sequence-aware architectures (e.g., transformer-based models) for error prediction and document findings for future work

Qualification

Machine Learning fundamentalsPythonML frameworks (PyTorch/TensorFlow)ProbabilityStatisticsLinear AlgebraData structuresAlgorithmsLinux command lineShell scriptingCommunication skillsCollaboration skillsInterpersonal skills

Required

Must be pursuing a Master's or PhD Degree
Required Majors: Computer Science, Physics, Applied mathematics/Engineering, Biology, Chemistry (or closely related engineering/science fields)
Working knowledge of Probability, Statistics and Machine Learning fundamentals
Solid understanding of Linear Algebra and Programming Methodology
Proficiency in Python and at least one ML framework (PyTorch or TensorFlow)
Strong data structures and algorithms fundamentals; ability to write clean and efficient code
Comfort with Linux command line and basic shell scripting

Preferred

Biology/Chemistry background is a plus
Excellent communication, collaboration, and interpersonal skills
Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion

Benefits

Paid holiday time off benefits

Company

Roche is a pharmaceutical and diagnostics company that offers medicines and diagnostic tests for various medical conditions and diseases.

H1B Sponsorship

Roche has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (9)
2023 (6)
2022 (2)
2021 (2)

Funding

Current Stage
Public Company
Total Funding
$7.79B
Key Investors
SoftBankSCALE AINovartis
2021-08-04Post Ipo Equity· $5B
2020-12-07IPO
2020-05-06Post Ipo Equity· $0.5M

Leadership Team

leader-logo
Alan Hippe
Member of the Executive Board - Group CFO
linkedin
leader-logo
Christine Bakan
Global Head and Group Vice President, Computational Science and Informatics R&D
linkedin
Company data provided by crunchbase