Senior Data Engineer II - Electronic Health Records (EHR) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Formation Bio · 1 day ago

Senior Data Engineer II - Electronic Health Records (EHR)

Formation Bio is a tech and AI driven pharma company focused on efficient drug development. They are seeking a Senior Data Engineer to transform Electronic Health Records (EHR) data into structured, analytics-ready assets and collaborate with the Data Science team to enhance healthcare data usability.

Artificial Intelligence (AI)BiotechnologyClinical TrialsHealth CareMedicalPharmaceutical
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Model and transform raw EHR data into clean, canonical, and analytics-ready datasets using SQL, Python, and clinical standards like FHIR, HL7, or OMOP
Build and manage scalable data pipelines using Dagster for orchestration, dbt for transformation, and Snowflake as the primary compute and storage engine
Collaborate with Data Science and product stakeholders to co-develop cohort logic, derived features, and structured outputs that meet real-world scientific needs
Apply Generative AI techniques within transformation layers—using LLMs for named entity recognition, document summarization, classification, and schema alignment
Write robust, testable, and version-controlled code that adheres to CI/CD and data governance best practices
Implement data validation and observability frameworks to ensure quality, trust, and reproducibility of datasets
Document transformation logic, assumptions, and data lineage in collaboration with metadata and cataloging systems
Contribute to the evolution of the Data Platform by helping define standards, patterns, and best practices around GenAI and platform-scale data engineering

Qualification

Data EngineeringSQLPythonGenerative AIEHR DatasetsSnowflakeDagsterDbtHealthcare ExperienceDocumentationGrowth Mindset

Required

5+ years of experience in data engineering, ideally with at least 2 years working in healthcare or life sciences, including direct exposure to EHR datasets
Experience with ontologies and biomedical schemas (e.g. UMLS, LOINC, ICD9/10, MeSH, etc.)
Experience and understanding of modalities found within EHR datasets incl. Billing claims, lab results, visit notes, images
Experience in biomedical feature engineering, e.g. variable transformations and derivatives
Fluent in SQL and Python, and you've built and maintained production-grade pipelines that support analytics, science, or operational workflows
Hands-on expertise with modern data infrastructure
Experienced in applying GenAI techniques within pipelines, including prompt engineering, LLM-based entity extraction, and classification/summarization workflows
Value clarity, documentation, and structured thinking—especially when working with complex data like healthcare records
Have a growth mindset and are excited to build bridges between isolated data environments and governed, shared models that power scientific innovation

Preferred

Bonus: You've worked in regulated or privacy-sensitive data environments, and you're familiar with governance models for PHI or sensitive data

Benefits

Equity
Comprehensive benefits
Generous perks
Hybrid flexibility

Company

Formation Bio

company-logo
Formation Bio is a drug development company that strives to provide treatments to patients faster by reimagining clinical trials.

H1B Sponsorship

Formation Bio has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (5)

Funding

Current Stage
Late Stage
Total Funding
$528M
Key Investors
Andreessen Horowitz
2024-06-26Series D· $372M
2021-09-30Series C· $156M
2018-10-15Series Unknown

Leadership Team

leader-logo
Benjamine Liu
CEO and Co-founder at Formation Bio
linkedin
leader-logo
Linhao Zhang
Co-Founder
linkedin
Company data provided by crunchbase