Prellis Biologics · 1 month ago
Scientific Data Platform Architect — Antibody Discovery
Prellis Biologics is a pre-IPO biotech located in Berkeley, CA, focused on revolutionizing drug discovery through the integration of human biology and machine learning. The Scientific Data Platform Architect will design and build an end-to-end scientific data platform to enhance antibody discovery, ensuring data is structured, accessible, and ready for AI/ML applications.
3D PrintingBiopharmaBiotechnologyTherapeutics
Responsibilities
Own the canonical schemas (with selective JSONB), indexing/partitioning, materialized views, and stable entity IDs (samples, sequences, assays, runs)
Operate RDS/Aurora PostgreSQL, S3 for raw artifacts, and right-sized IAM/VPC access; set guardrails for backups, recovery, and monitoring (CloudWatch)
Make data Findable (catalog/registry tables, searchable metadata), Accessible (role-based access, documented APIs/exports), Interoperable (controlled vocabularies, standard formats such as CSV/Parquet, FASTA/VDJ, FCS/SPR), and Reusable (required metadata, units/QC flags, versioned tables)
Define and enforce data contracts, provenance, and lightweight review checkpoints
Build parsers/pipelines for instrument exports (CSV/TSV, FCS, ELISA/SPR/BLI), PipeBio repertoire/QC outputs, and Benchling entities via API/webhooks
Add validation, unit normalization, schema migrations, and automated checks
Create curated analytic views (assay roll-ups, QC dashboards, lineage), and implement interactive visuals (dose–response fits, sensograms, flow summaries, repertoire plots) with Plotly/Dash, Shiny, Spotfire, Streamlit, or similar
Deliver drill-downs, comparisons across runs/targets, and clean CSV/Excel exports
Build and maintain a small Shiny (R/Python) or Python app (FastAPI + Dash/Plotly/Streamlit) that is role-aware, searchable, and easy for scientists to use; deploy simply (EC2/ECS/Docker)
Publish feature-ready Parquet/Arrow datasets (sequence features, developability metrics, assay labels like KD/EC50, clonotypes) with dataset versioning, timestamps, and lineage
Provide reproducible extracts/snapshots for training, and ingest model predictions/scores back into Postgres and the UI
Set patterns and code standards, mentor contributors, review designs, and coordinate with Biology, Analytics, and QA/Compliance
Keep cost/performance sane; evolve the roadmap as assays and throughput grow
A clear Postgres schema with stable IDs, required metadata, and provenance supporting FAIR discovery
Automated ETL for Benchling + PipeBio + instruments, with validation and unit normalization
A usable app delivering interactive analytics & visualizations scientists rely on daily
ML-ready datasets with documented contracts; backups, monitoring, and a published data dictionary/metadata guide
Qualification
Required
Bachelors degree is Computer Science or similar field
7+ years building data platforms or complex data products; expert SQL/PostgreSQL (schema design, optimization, migrations)
Strong Python or R for data engineering and app development (Pandas/SQLAlchemy or Shiny/Plotly/Streamlit)
Proven ETL experience from files/APIs and pragmatic scheduling (cron/Airflow/Prefect—keep it simple)
Practical AWS with Postgres on RDS/Aurora, S3 for storage, basic IAM/VPC, and CloudWatch for monitoring
Hands-on analytics & visualization for scientific datasets
Working knowledge of FAIR principles and shaping AI/ML-ready datasets (features, labels, versioned exports)
Preferred
Benchling developer experience (entities, webhooks) and familiarity with PipeBio outputs
Exposure to lab data types (FCS, BLI/SPR, ELISA, NGS summaries, PDB) and data integrity concepts (ALCOA+, 21 CFR Part 11 basics)
Light containerization (Docker) and deploying a small app on EC2/ECS
Experience round-tripping model outputs to a database/UI; comfort with Jupyter/scikit-learn/PyTorch
Benefits
A competitive employee benefits package, including group medical, dental and vision coverage, life and disability insurance, flexible spending accounts an a 401(k) plan
Stock-based long term incentives
Bonus plan
Holiday package including a 1+ week winter shutdown
Flexible work models, including remote and hybrid working arrangements, where possible
Company
Prellis Biologics
Prellis employs holographic tissue printing technology with fully human antibody discovery, in vitro human disease and ADME/Tox models.
H1B Sponsorship
Prellis Biologics has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2023 (2)
Funding
Current Stage
Growth StageTotal Funding
$79.37MKey Investors
Celesta CapitalKhosla VenturesTrue Ventures
2025-03-10Series C
2023-11-17Series C
2022-08-10Series C· $35M
Recent News
Corridor Business Journal
2025-11-26
Company data provided by crunchbase