Data Science / AI Intern – Literature Mining & Graph Modeling jobs in United States
cer-icon
Apply on Employer Site
company-logo

AstraZeneca · 16 hours ago

Data Science / AI Intern – Literature Mining & Graph Modeling

AstraZeneca is seeking Master’s and PhD students for a 10-week internship role at their Waltham, MA site. This internship focuses on developing end-to-end pipelines for literature mining and graph modeling to enhance R&D insights through biomedical NLP and data engineering.

BiopharmaBiotechnologyHealth CareMedicalPharmaceuticalPrecision Medicine
check
Comp. & Benefits
badNo H1Bnote

Responsibilities

Build an end-to-end pipeline turning literature (papers, abstracts, patents) into a standardized knowledge graph with contextualized evidence
Handle source selection, inclusion/exclusion criteria, updates, and data snapshots
Develop NLP for entity recognition, relation extraction, assertion detection, and context tagging (drug, indication, resistance, biomarker, outcome)
Encode domain relations (e.g., Drug–mechanism→Gene/Pathway; Biomarker–modulates→Outcome; ADC–targets→Antigen)
Map entities to controlled vocabularies; manage synonyms, disambiguation, and canonical IDs
Implement edge-level confidence scoring (source quality, claim type, co-occurrence, citations, model certainty) with full evidence provenance
Build graph storage (property graph or RDF) and queryable APIs
Deliver interactive visualization (UI or notebook) with filters, context toggles, and evidence drill-down
Define metrics, run error analyses, and validate with scientific stakeholders
Ensure reproducibility and documentation: version models/data; record architecture, assumptions, benchmarks; provide user guides
Present outcomes to data science, oncology, and translational medicine teams

Qualification

NLPGraph modelingPythonMachine LearningData visualizationVersion controlData managementProblem-solvingCommunicationCollaboration

Required

Master's and PhD students studying Biology, Computer Science, Chemistry, Physics, Engineering, Biomedical Science, Pharmacology, Data Science, Bioinformatics, or a related discipline
Candidates must have an expected graduation date after August 2026
US Work Authorization is required at time of application
This role will not be providing OPT support
NLP and ML: NER, relation extraction, transformers; Python-based workflows
Graph/data modeling: experience with Neo4j, NetworkX, or RDF/SPARQL
Reproducibility: version control, environment management, documentation
Soft skills: problem-solving, communication, collaboration
Tech stack: Python (spaCy, Hugging Face), scikit-learn; PyTorch or TensorFlow
Data & viz: pandas; PySpark or Dask; Plotly/Dash, D3.js, Neo4j Bloom
Dev practices: Git, Conda/Poetry, Docker, experiment tracking
Ability to report onsite to Waltham, MA site 3-5 days per week
This role will not provide relocation assistance

Preferred

Domain knowledge: genes, pathways, biomarkers, therapeutic modalities (incl. ADCs) preferred

Company

AstraZeneca

company-logo
AstraZeneca is a pharmaceutical company that discovers, develops, manufactures, and markets prescription medicines. It is a sub-organization of Investor.

Funding

Current Stage
Public Company
Total Funding
$5.26B
2024-07-30Post Ipo Debt· $1.51B
2023-02-28Post Ipo Debt· $2.25B
2023-02-24Post Ipo Debt· $1.5B

Leadership Team

leader-logo
Pascal Soriot
Executive Director and Chief Executive Officer
leader-logo
Aradhana Sarin
Group CFO and Executive Director
linkedin
Company data provided by crunchbase