Data Operations Engineer (Bioinformatics) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Holobiome · 1 week ago

Data Operations Engineer (Bioinformatics)

Holobiome is building a future where we can modify the human gut microbiome with precision, enabling optimized health for everyone. They are seeking a Bioinformatics Data Operations Engineer to lead the automated data identification, scraping, and ingestion engine, transforming raw data into a queryable asset that powers their ML pipelines.

Health CareMedicalPharmaceuticalTherapeutics
check
H1B Sponsor Likelynote

Responsibilities

Build an "Agentic" Data Discovery Framework
Automate the Hunt: Design and deploy AI agents (using frontier LLMs/APIs) to perform automated data scraping on the web, automatically identifying microbiome cohorts mentioned in scientific literature and disparate repositories
Global Data Engineering: Engineer robust solutions to ingest data from challenging international archives. You will navigate complex access protocols, rate limits, and licensing logic to ensure a steady stream of data into our pipeline
Public vs. Private Logic: Build logic to automatically classify datasets as "Public" (immediately downloadable) vs. "Private" (requiring license/contact) and queue them for ingestion
Master Metadata Curation & Harmonization
Solve the "Garbage In" Problem: Raw public data lacks standardized labels. You will build a master ontology for genomes and metagenomes to harmonize disparate metadata (e.g., mapping "Age," "BMI," "Disease Status") into a unified, queryable schema
Algorithmic Ranking: Implement a scoring system that rates incoming datasets based on various quality and relevance metrics. Your screening system will autonomously decide what enters our curated database and what gets rejected
Strategic Prioritization: Integrate external public health metrics and global disease burden data to help leadership prioritize which disease cohorts to pursue
Infrastructure & "Mission Control"
Manage the Iron: Manage the data ingress/egress on large servers, optimizing for massive file handling (petabyte scale) and backing up to cost-effective cloud cold storage solutions
Real-time Visualization: Build a lightweight internal web tool that visualizes our data conquest in real-time. Leadership should be able to see a live count of identified vs. banked genomes
Internal Data Operations
Physical-to-Digital Bridge: Assist in scripting logic for physical sample tracking, such as QR codes on sample kits that trigger database updates upon scanning
Internal Scaling: Organize and optimize the flow of internal data (tens of thousands of samples), ensuring fast querying and retrieval of massive genomic files for the wet lab and computational teams

Qualification

BashPythonRAI/LLM EngineeringSQL/noSQLRESTAPI query interfacesLinux environmentAWS cloud storageBioinformatics repositoriesCuriosityScientific storytellingCommunicationCollaboration skills

Required

Proficiency in Bash, R, and Python (including API interactions and advanced scraping frameworks)
Experience integrating LLMs into programmatic workflows (Agents) to parse unstructured text (PDFs, study protocols) into structured data
Familiarity with SQL/noSQL, REST, API query interfaces, and experience scraping and handling massive datasets (JSON/CSV/XML)
Comfortable in a Linux environment; experience using local servers and AWS cloud storage (S3/Glacier)
Basic experience or familiarity with biological laboratory processes and jargon. Understanding of biological metadata (taxonomies, clinical variables); experience with bioinformatics repositories (SRA, ENA) is a massive plus
Excellent communication and collaboration skills, fueled by a relentlessly curious mind and a commitment to transforming raw research into tangible microbiome solutions
Master's degree in Computer Science, Bioinformatics, Data Science OR equivalent technical experience/portfolio (we value builders)

Preferred

Experience with MCP, interrupts, tool calling, and/or agentic IDEs (Cursor, Windsurf, Claude Code, Antigravity, etc.)
Experience with Streamlit/Dash, React/Vue.js/Angular, or Flask/Django to build the internal 'Mission Control' dashboard
Familiarity with standardized vocabularies (MeSH, ISO codes) to automate data harmonization
Ability to translate complex data insights into clear visual narratives for non-technical stakeholders and partners
Experience securely handling PPI and HIPPA data
Experience managing on-prem compute infrastructure and networking

Company

Holobiome

twittertwittertwitter
company-logo
Holobiome is an owner and operator of a therapeutic company intended to treat diseases of the enteric and central nervous systems.

H1B Sponsorship

Holobiome has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (1)
2020 (1)

Funding

Current Stage
Early Stage
Total Funding
$16.9M
Key Investors
iSelect FundCorundum Systems BiologyAlexandria LaunchLabs
2024-10-09Seed· $9M
2024-03-19Series Unknown· $6.59M
2021-10-28Grant· $1M

Leadership Team

leader-logo
Philip Strandwitz
Co-founder and CEO
linkedin
Company data provided by crunchbase