Mithrl · 1 month ago
Data Scientist, Knowledge Graphs
Mithrl is a fast-growing tech bio company building the world's first commercially available AI Co-Scientist. The Data Scientist, Knowledge Graphs will focus on ingesting and harmonizing biological data to curate relationships that enable reasoning across various biological datasets and enhance the AI Co-Scientist's capabilities.
Artificial Intelligence (AI)Data Center AutomationLife ScienceMedicalSoftware
Responsibilities
Ingest, harmonize, and version high value public biological datasets such as CellxGene, Gemma, ARCHS4, ENCODE, GTEx, TCGA, etc
Ingest well maintained peer reviewed knowledgebases including OpenTargets, HPA, and similar resources
Build automated pipelines to curate and expand relationships inside the knowledge graph
Define and evolve schemas for node types, relationships, metadata rules, and ontology alignment
Harmonize variable IDs and metadata fields across all imported sources to create a unified knowledge layer
Build and maintain versioning, change tracking, and provenance systems for all data and relationships
Develop the framework that allows users to build custom knowledge graphs from the analyses they run inside Mithrl
Build features that allow users to explore, query, and interact with their graphs
Work closely with ML engineers, bioinformatics teams, and discovery application teams to ensure the knowledge graph supports downstream reasoning and analysis
Validate the correctness, completeness, and integrity of the knowledge graph across releases
Qualification
Required
Strong experience in data science, bioinformatics, computational biology, or a related field
Experience working with biological knowledgebases, public datasets, or ontology driven systems
Familiarity with graph data structures, relationship modeling, and knowledge graph concepts
Experience harmonizing heterogeneous biological datasets and mapping variable IDs across sources
Proficiency in Python and scientific computing libraries
Ability to build ingestion pipelines for structured or semi structured biological data
Strong understanding of metadata standards, biological ontologies, and domain logic
Ability to translate complex biological information into structured, machine readable representations
Excellent communication skills and comfort collaborating across engineering and scientific teams
Preferred
Experience with graph databases or graph query languages
Experience with KG curation, link prediction, relationship extraction, or graph based ML
Familiarity with multi modal data integration
Previous work on biological or chemical knowledge graphs
Experience with public consortia such as ENCODE, GTEx, TCGA, or ChEMBL, etc
Prior experience in a tech bio startup or scientific software environment
Benefits
Comprehensive PPO health coverage through Anthem (medical, dental, and vision)
401(k) with top-tier plans
Company
Mithrl
Mithrl is a software development company that builds the custom workflows for NGS data on-demand.