Formation Bio · 1 day ago
Senior Data Engineer II - Electronic Health Records (EHR)
Formation Bio is a tech and AI driven pharma company focused on efficient drug development. They are seeking a Senior Data Engineer to transform Electronic Health Records (EHR) data into structured, analytics-ready assets and collaborate with the Data Science team to enhance healthcare data usability.
Artificial Intelligence (AI)BiotechnologyClinical TrialsHealth CareMedicalPharmaceutical
Responsibilities
Model and transform raw EHR data into clean, canonical, and analytics-ready datasets using SQL, Python, and clinical standards like FHIR, HL7, or OMOP
Build and manage scalable data pipelines using Dagster for orchestration, dbt for transformation, and Snowflake as the primary compute and storage engine
Collaborate with Data Science and product stakeholders to co-develop cohort logic, derived features, and structured outputs that meet real-world scientific needs
Apply Generative AI techniques within transformation layers—using LLMs for named entity recognition, document summarization, classification, and schema alignment
Write robust, testable, and version-controlled code that adheres to CI/CD and data governance best practices
Implement data validation and observability frameworks to ensure quality, trust, and reproducibility of datasets
Document transformation logic, assumptions, and data lineage in collaboration with metadata and cataloging systems
Contribute to the evolution of the Data Platform by helping define standards, patterns, and best practices around GenAI and platform-scale data engineering
Qualification
Required
5+ years of experience in data engineering, ideally with at least 2 years working in healthcare or life sciences, including direct exposure to EHR datasets
Experience with ontologies and biomedical schemas (e.g. UMLS, LOINC, ICD9/10, MeSH, etc.)
Experience and understanding of modalities found within EHR datasets incl. Billing claims, lab results, visit notes, images
Experience in biomedical feature engineering, e.g. variable transformations and derivatives
Fluent in SQL and Python, and you've built and maintained production-grade pipelines that support analytics, science, or operational workflows
Hands-on expertise with modern data infrastructure
Experienced in applying GenAI techniques within pipelines, including prompt engineering, LLM-based entity extraction, and classification/summarization workflows
Value clarity, documentation, and structured thinking—especially when working with complex data like healthcare records
Have a growth mindset and are excited to build bridges between isolated data environments and governed, shared models that power scientific innovation
Preferred
Bonus: You've worked in regulated or privacy-sensitive data environments, and you're familiar with governance models for PHI or sensitive data
Benefits
Equity
Comprehensive benefits
Generous perks
Hybrid flexibility
Company
Formation Bio
Formation Bio is a drug development company that strives to provide treatments to patients faster by reimagining clinical trials.
H1B Sponsorship
Formation Bio has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (5)
Funding
Current Stage
Late StageTotal Funding
$528MKey Investors
Andreessen Horowitz
2024-06-26Series D· $372M
2021-09-30Series C· $156M
2018-10-15Series Unknown
Recent News
Spark Capital
2026-01-07
BioWorld Financial Watch
2025-12-19
2025-12-16
Company data provided by crunchbase