2026 Summer Intern - ECD Clinical Insight and Automation AI Modeling jobs in United States
cer-icon
Apply on Employer Site
company-logo

Genentech · 10 hours ago

2026 Summer Intern - ECD Clinical Insight and Automation AI Modeling

Genentech is a leader in biotechnology, and they are seeking a Summer Intern for their Clinical Insight and Automation team. The role involves contributing to research on generative modeling and causal inference for synthetic clinical data, collaborating with scientists and engineers to develop novel methodologies and evaluation protocols.

BiotechnologyLife ScienceManufacturing
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Identify and prepare appropriate clinical datasets, define generation targets, and implement reproducible preprocessing pipelines
Develop and compare modern generative modeling approaches for patient-level outcomes and trajectories (e.g., diffusion, transformer, and latent-variable models) conditioned on baseline covariates and study design assumptions; and unsupervised clustering/topic models to identify clinically meaningful patterns
Incorporate causal inference considerations (confounding control, covariate balance, estimands) and quantify how synthetic controls impact downstream treatment-effect estimation; and perform manual "gold standard" labeling to create high-quality training datasets
Design rigorous evaluation protocols for fidelity and utility, including distributional similarity, calibration/uncertainty, fairness and subgroup robustness, and privacy-risk checks, with ablations and sensitivity analyses
Build end-to-end experiment infrastructure (training and evaluation scripts, configuration management, and experiment tracking) to support reproducibility and efficient iteration
Co-prepare a conference-quality manuscript, figures, and supplementary materials, including the paper checklist and (where appropriate) anonymized code/data artifacts consistent with reproducibility and ethics expectations
Communicate progress through regular updates; deliver a final technical report, curated repository, and presentation to cross-functional stakeholders
Develop and execute Python scripts to ingest, clean, and normalize large volumes of unstructured clinical query text and patient-level datasets
Help translate technical AI findings into a "Recommendations Matrix" that suggests specific site training or system improvements for stakeholders

Qualification

PythonPyTorchDeep learning modelsStatistical machine learningCausal inferenceHealthcare datasetsSQLNLP frameworksExperiment trackingClustering algorithmsReproducible researchDockerTechnical writingCommunication skills

Required

Must be pursuing a PhD (enrolled student)
Required Majors: Computer Sciences, Artificial Intelligence, Computational Sciences, or a related field with a focus on machine learning systems or similar
Programming proficiency in Python; hands-on experience with PyTorch and scientific computing libraries (NumPy, Pandas)
Experience developing and training deep learning models, with familiarity in modern generative modeling (e.g., diffusion models, VAEs, autoregressive/transformer models)
Strong understanding of statistical machine learning and experimental methodology (ablations, error analysis, and appropriate statistical evaluation)
Foundational understanding of causal inference and counterfactual reasoning (e.g., confounding, estimands, treatment-effect estimation) and how these considerations interact with modeling choices
Demonstrated technical writing skills (e.g., research reports or papers); comfort preparing conference-style manuscripts in LaTeX and presenting results to technical audiences
Commitment to reproducible research and software engineering best practices (version control, documentation, experiment tracking), with the ability to package artifacts for peer review; collaborative communication skills

Preferred

Excellent communication, collaboration, and interpersonal skills
Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion
Prior publication or strong experience preparing submissions for top-tier ML venues (e.g., NeurIPS/ICML/ICLR), including familiarity with reproducibility expectations (paper checklist, artifact preparation)
Experience working with healthcare/biomedical datasets (EHR, claims, or clinical trial data); familiarity with data standards (OMOP, FHIR) is a plus
Knowledge of synthetic data evaluation and privacy risk assessment (e.g., memorization tests, membership inference, differential privacy)
Familiarity with causal ML topics such as causal representation learning, domain adaptation, and externally controlled trial methodology
Experience with scalable training environments (GPUs, distributed computing) and modern ML tooling (Docker, experiment tracking platforms)
Experience leveraging foundation models (LLMs) or structured prompting to incorporate domain knowledge into ML workflows is beneficial, but not required
Ability to query and extract data from relational databases using SQL and hands-on experience with NLP frameworks and clustering algorithms
Familiarity with clinical trial operations, Electronic Data Capture (EDC) systems, or the regulatory landscape of the pharmaceutical industry is beneficial, but not required

Benefits

Paid holiday time off benefits

Company

Genentech

company-logo
Genentech is a biotechnology research company that specializes in genetic testing and personalized medicines.

H1B Sponsorship

Genentech has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (167)
2024 (148)
2023 (150)
2022 (178)
2021 (121)
2020 (158)

Funding

Current Stage
Public Company
Total Funding
unknown
2009-03-26Acquired
1999-07-20IPO
1976-01-01Series Unknown

Leadership Team

leader-logo
Ashley Magargee
Chief Executive Officer
linkedin
leader-logo
Michael Laird
Vice President, Global MSAT Drug Substance Biologics Technology and Commercial Products Support
linkedin
Company data provided by crunchbase