Roche · 13 hours ago
2026 Summer Intern - ECD Clinical Insight and Automation AI Modeling
Roche is a leading biotechnology company, and they are seeking a 2026 Summer Intern to join their Clinical Insight and Automation team. This internship will focus on developing AI-driven methods for synthetic clinical data generation and causal inference, contributing to innovative research in healthcare data applications.
BiotechnologyHealth CareHealth DiagnosticsOncologyPharmaceuticalPrecision Medicine
Responsibilities
Identify and prepare appropriate clinical datasets, define generation targets, and implement reproducible preprocessing pipelines
Develop and compare modern generative modeling approaches for patient-level outcomes and trajectories (e.g., diffusion, transformer, and latent-variable models) conditioned on baseline covariates and study design assumptions; and unsupervised clustering/topic models to identify clinically meaningful patterns
Incorporate causal inference considerations (confounding control, covariate balance, estimands) and quantify how synthetic controls impact downstream treatment-effect estimation; and perform manual "gold standard" labeling to create high-quality training datasets
Design rigorous evaluation protocols for fidelity and utility, including distributional similarity, calibration/uncertainty, fairness and subgroup robustness, and privacy-risk checks, with ablations and sensitivity analyses
Build end-to-end experiment infrastructure (training and evaluation scripts, configuration management, and experiment tracking) to support reproducibility and efficient iteration
Co-prepare a conference-quality manuscript, figures, and supplementary materials, including the paper checklist and (where appropriate) anonymized code/data artifacts consistent with reproducibility and ethics expectations
Communicate progress through regular updates; deliver a final technical report, curated repository, and presentation to cross-functional stakeholders
Develop and execute Python scripts to ingest, clean, and normalize large volumes of unstructured clinical query text and patient-level datasets
Help translate technical AI findings into a "Recommendations Matrix" that suggests specific site training or system improvements for stakeholders
Qualification
Required
Must be pursuing a PhD (enrolled student)
Required Majors: Computer Sciences, Artificial Intelligence, Computational Sciences, or a related field with a focus on machine learning systems or similar
Programming proficiency in Python; hands-on experience with PyTorch and scientific computing libraries (NumPy, Pandas)
Experience developing and training deep learning models, with familiarity in modern generative modeling (e.g., diffusion models, VAEs, autoregressive/transformer models)
Strong understanding of statistical machine learning and experimental methodology (ablations, error analysis, and appropriate statistical evaluation)
Foundational understanding of causal inference and counterfactual reasoning (e.g., confounding, estimands, treatment-effect estimation) and how these considerations interact with modeling choices
Demonstrated technical writing skills (e.g., research reports or papers); comfort preparing conference-style manuscripts in LaTeX and presenting results to technical audiences
Commitment to reproducible research and software engineering best practices (version control, documentation, experiment tracking), with the ability to package artifacts for peer review; collaborative communication skills
Preferred
Excellent communication, collaboration, and interpersonal skills
Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion
Prior publication or strong experience preparing submissions for top-tier ML venues (e.g., NeurIPS/ICML/ICLR), including familiarity with reproducibility expectations (paper checklist, artifact preparation)
Experience working with healthcare/biomedical datasets (EHR, claims, or clinical trial data); familiarity with data standards (OMOP, FHIR) is a plus
Knowledge of synthetic data evaluation and privacy risk assessment (e.g., memorization tests, membership inference, differential privacy)
Familiarity with causal ML topics such as causal representation learning, domain adaptation, and externally controlled trial methodology
Experience with scalable training environments (GPUs, distributed computing) and modern ML tooling (Docker, experiment tracking platforms)
Experience leveraging foundation models (LLMs) or structured prompting to incorporate domain knowledge into ML workflows is beneficial, but not required
Ability to query and extract data from relational databases using SQL and hands-on experience with NLP frameworks and clustering algorithms
Familiarity with clinical trial operations, Electronic Data Capture (EDC) systems, or the regulatory landscape of the pharmaceutical industry is beneficial, but not required
Benefits
Paid holiday time off benefits
Company
Roche
Roche is a pharmaceutical and diagnostics company that offers medicines and diagnostic tests for various medical conditions and diseases.
H1B Sponsorship
Roche has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (9)
2023 (6)
2022 (2)
2021 (2)
Funding
Current Stage
Public CompanyTotal Funding
$7.79BKey Investors
SoftBankSCALE AINovartis
2021-08-04Post Ipo Equity· $5B
2020-12-07IPO
2020-05-06Post Ipo Equity· $0.5M
Leadership Team
Recent News
2026-01-25
2026-01-23
2026-01-22
Company data provided by crunchbase