Mithrl · 1 month ago
Data Engineer, Knowledge Graphs
Mithrl is building the world’s first commercially available AI Co-Scientist, transforming messy biological data into insights. The Data Engineer, Knowledge Graphs will build infrastructure for the biological knowledge layer, focusing on ETL pipelines, schema design, and API creation to support the platform's needs.
Artificial Intelligence (AI)Data Center AutomationLife ScienceMedicalSoftware
Responsibilities
Build and maintain ETL pipelines for large public biological datasets and curated knowledge sources
Design, implement, and evolve schemas and storage models for graph structured biological data
Create efficient APIs and query surfaces that allow internal teams and AI systems to retrieve nodes, relationships, pathways, annotations, and graph analytics
Partner closely with the Data Scientists to operationalize curated relationships, harmonized variable IDs, metadata standards, and ontology mappings
Build data models that support multi tenant access, versioning, and reproducibility across releases
Implement scalable storage and indexing strategies for high volume graph data
Maintain data quality, validate data integrity, and build monitoring around ingestion and usage
Work with ML engineers and application teams to ensure the knowledge graph infrastructure supports downstream reasoning, analysis, and discovery applications
Support data warehousing, documentation, and API reliability
Ensure performance, reliability, and uptime for knowledge graph services
Qualification
Required
Strong experience as a data engineer or backend engineer working with data intensive systems
Experience building ETL or ELT pipelines for large structured or semi structured datasets
Strong understanding of database design, schema modeling, and data architecture
Experience with graph data models or willingness to learn graph storage concepts
Proficiency in Python or similar languages for data engineering
Experience designing and maintaining APIs for data access
Understanding of versioning, provenance, validation, and reproducibility in data systems
Experience with cloud infrastructure and modern data stack tools
Strong communication skills and ability to work closely with scientific and engineering teams
Preferred
Experience with graph databases or graph query languages
Experience with biological or chemical data sources
Familiarity with ontologies, controlled vocabularies, and metadata standards
Experience with data warehousing and analytical storage formats
Previous work in a tech bio company or scientific platform environment
Benefits
Comprehensive PPO health coverage through Anthem (medical, dental, and vision)
401(k) with top-tier plans
Company
Mithrl
Mithrl is a software development company that builds the custom workflows for NGS data on-demand.