AstraZeneca · 1 day ago
Data Science / AI Intern – Literature Mining & Graph Modeling
AstraZeneca is seeking Master’s and PhD students for a 10-week internship role at their Waltham, MA site. This internship focuses on developing end-to-end pipelines for literature mining and graph modeling to enhance R&D insights through biomedical NLP and data engineering.
BiopharmaBiotechnologyHealth CareMedicalPharmaceuticalPrecision Medicine
Responsibilities
Build an end-to-end pipeline turning literature (papers, abstracts, patents) into a standardized knowledge graph with contextualized evidence
Handle source selection, inclusion/exclusion criteria, updates, and data snapshots
Develop NLP for entity recognition, relation extraction, assertion detection, and context tagging (drug, indication, resistance, biomarker, outcome)
Encode domain relations (e.g., Drug–mechanism→Gene/Pathway; Biomarker–modulates→Outcome; ADC–targets→Antigen)
Map entities to controlled vocabularies; manage synonyms, disambiguation, and canonical IDs
Implement edge-level confidence scoring (source quality, claim type, co-occurrence, citations, model certainty) with full evidence provenance
Build graph storage (property graph or RDF) and queryable APIs
Deliver interactive visualization (UI or notebook) with filters, context toggles, and evidence drill-down
Define metrics, run error analyses, and validate with scientific stakeholders
Ensure reproducibility and documentation: version models/data; record architecture, assumptions, benchmarks; provide user guides
Present outcomes to data science, oncology, and translational medicine teams
Qualification
Required
Master's and PhD students studying Biology, Computer Science, Chemistry, Physics, Engineering, Biomedical Science, Pharmacology, Data Science, Bioinformatics, or a related discipline
Candidates must have an expected graduation date after August 2026
US Work Authorization is required at time of application
This role will not be providing OPT support
NLP and ML: NER, relation extraction, transformers; Python-based workflows
Graph/data modeling: experience with Neo4j, NetworkX, or RDF/SPARQL
Reproducibility: version control, environment management, documentation
Soft skills: problem-solving, communication, collaboration
Tech stack: Python (spaCy, Hugging Face), scikit-learn; PyTorch or TensorFlow
Data & viz: pandas; PySpark or Dask; Plotly/Dash, D3.js, Neo4j Bloom
Dev practices: Git, Conda/Poetry, Docker, experiment tracking
Ability to report onsite to Waltham, MA site 3-5 days per week
This role will not provide relocation assistance
Preferred
Domain knowledge: genes, pathways, biomarkers, therapeutic modalities (incl. ADCs) preferred
Company
AstraZeneca
AstraZeneca is a pharmaceutical company that discovers, develops, manufactures, and markets prescription medicines. It is a sub-organization of Investor.
Funding
Current Stage
Public CompanyTotal Funding
$5.26B2024-07-30Post Ipo Debt· $1.51B
2023-02-28Post Ipo Debt· $2.25B
2023-02-24Post Ipo Debt· $1.5B
Leadership Team
Recent News
Essential Business
2026-01-24
2026-01-23
Company data provided by crunchbase