Apply on Employer Site

Holobiome · 1 week ago

Data Operations Engineer (Bioinformatics)

Boston, MA

Full-time

Hybrid

Entry, Mid Level

2+ years exp

Holobiome is building a future where we can modify the human gut microbiome with precision, enabling optimized health for everyone. They are seeking a Bioinformatics Data Operations Engineer to lead the automated data identification, scraping, and ingestion engine, transforming raw data into a queryable asset that powers their ML pipelines.

Health CareMedicalPharmaceuticalTherapeutics

H1B Sponsor Likely

Responsibilities

Build an "Agentic" Data Discovery Framework

Automate the Hunt: Design and deploy AI agents (using frontier LLMs/APIs) to perform automated data scraping on the web, automatically identifying microbiome cohorts mentioned in scientific literature and disparate repositories

Global Data Engineering: Engineer robust solutions to ingest data from challenging international archives. You will navigate complex access protocols, rate limits, and licensing logic to ensure a steady stream of data into our pipeline

Public vs. Private Logic: Build logic to automatically classify datasets as "Public" (immediately downloadable) vs. "Private" (requiring license/contact) and queue them for ingestion

Master Metadata Curation & Harmonization

Solve the "Garbage In" Problem: Raw public data lacks standardized labels. You will build a master ontology for genomes and metagenomes to harmonize disparate metadata (e.g., mapping "Age," "BMI," "Disease Status") into a unified, queryable schema

Algorithmic Ranking: Implement a scoring system that rates incoming datasets based on various quality and relevance metrics. Your screening system will autonomously decide what enters our curated database and what gets rejected

Strategic Prioritization: Integrate external public health metrics and global disease burden data to help leadership prioritize which disease cohorts to pursue

Infrastructure & "Mission Control"

Manage the Iron: Manage the data ingress/egress on large servers, optimizing for massive file handling (petabyte scale) and backing up to cost-effective cloud cold storage solutions

Real-time Visualization: Build a lightweight internal web tool that visualizes our data conquest in real-time. Leadership should be able to see a live count of identified vs. banked genomes

Internal Data Operations

Physical-to-Digital Bridge: Assist in scripting logic for physical sample tracking, such as QR codes on sample kits that trigger database updates upon scanning

Internal Scaling: Organize and optimize the flow of internal data (tens of thousands of samples), ensuring fast querying and retrieval of massive genomic files for the wet lab and computational teams

Qualification

BashPythonRAI/LLM EngineeringSQL/noSQLRESTAPI query interfacesLinux environmentAWS cloud storageBioinformatics repositoriesCuriosityScientific storytellingCommunicationCollaboration skills

Required

Proficiency in Bash, R, and Python (including API interactions and advanced scraping frameworks)

Experience integrating LLMs into programmatic workflows (Agents) to parse unstructured text (PDFs, study protocols) into structured data

Familiarity with SQL/noSQL, REST, API query interfaces, and experience scraping and handling massive datasets (JSON/CSV/XML)

Comfortable in a Linux environment; experience using local servers and AWS cloud storage (S3/Glacier)

Basic experience or familiarity with biological laboratory processes and jargon. Understanding of biological metadata (taxonomies, clinical variables); experience with bioinformatics repositories (SRA, ENA) is a massive plus

Excellent communication and collaboration skills, fueled by a relentlessly curious mind and a commitment to transforming raw research into tangible microbiome solutions

Master's degree in Computer Science, Bioinformatics, Data Science OR equivalent technical experience/portfolio (we value builders)

Preferred

Experience with MCP, interrupts, tool calling, and/or agentic IDEs (Cursor, Windsurf, Claude Code, Antigravity, etc.)

Experience with Streamlit/Dash, React/Vue.js/Angular, or Flask/Django to build the internal 'Mission Control' dashboard

Familiarity with standardized vocabularies (MeSH, ISO codes) to automate data harmonization

Ability to translate complex data insights into clear visual narratives for non-technical stakeholders and partners

Experience securely handling PPI and HIPPA data

Experience managing on-prem compute infrastructure and networking

Company

Holobiome

Holobiome is an owner and operator of a therapeutic company intended to treat diseases of the enteric and central nervous systems.

Founded in 2015

Cambridge, Massachusetts, USA

11-50 employees

https://holobiome.org/

H1B Sponsorship

Holobiome has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2023 (1)

2020 (1)

Funding

Current Stage

Early Stage

Total Funding

$16.9M

Key Investors

iSelect FundCorundum Systems BiologyAlexandria LaunchLabs

2024-10-09Seed· $9M

2024-03-19Series Unknown· $6.59M

2021-10-28Grant· $1M

Leadership Team

Philip Strandwitz

Co-founder and CEO

Recent News

AgFunderNews

Astanor Ventures’ Eric Archambeau on agrifoodtech investing: ‘It annoys me when people say ‘That will never work’

2025-08-28

Google Patent

Modulation of the gut microbiome to treat mental disorders or diseases of the …

2025-02-16

Company data provided by crunchbase