Apply on Employer Site

Glyphic Biotechnologies · 3 months ago

Senior/Staff Scientist, Data Science

Berkeley, CA

Full-time

Hybrid

Senior Level

$168K/yr - $238K/yr

4+ years exp

Glyphic Biotechnologies is developing a revolutionary single-molecule proteome sequencing platform aimed at transforming life science discovery. They are seeking a highly motivated and experienced Senior/Staff Data Scientist to advance this technology by designing algorithms, developing machine learning models, and collaborating with a team of scientists and engineers.

BiotechnologyHealth CareLife Science

Responsibilities

Design and implement novel algorithms to analyze proteomics data that no one has ever seen before

Develop machine learning models that can extract meaningful insights from complex, noisy biological signals

Develop and optimize algorithms for analyzing high-dimensional chemistry and NGS data, including single cell, spatial data, and LCMS data outputs

Build models that reveal how parameters and molecular interfaces drive outcomes, including surface interactions and molecule-target binding

Design and execute biostatistical analyses using Python and/or R to uncover significant trends, model experimental outcomes, and inform data-driven decision-making

Apply machine learning to guide experiment design, identify key parameters, and optimize workflows for efficiency and reproducibility

Develop clear, insightful visualizations that make complex, high-dimensional results understandable and actionable for scientists and stakeholders

Help define metrics and visualizations that clarify high-dimensional relationships for scientists and stakeholders

Partner with wet lab, hardware, and software teams to translate experimental goals into computational strategies

Create ETL pipelines that clean, normalize, and integrate diverse datasets (sequencing reads, LCMS spectra, metadata) into analysis-ready formats

Combine off-the-shelf pipelines (basecalling, variant calling, deconvolution) with custom scripts to deliver end-to-end solutions

Continuously improve throughput and data quality by automating QC steps and integrating feedback from experiments

Establish best practices for code quality, testing, and deployment that will scale with our growing team

Qualification

PythonRMachine LearningBioinformaticsData VisualizationNext Generation SequencingETL PipelinesCloud PlatformsChemistry Data ScienceSoft Skills

Required

PhD in Computer Science, Bioinformatics, Computational Biology, Biostatistics or related field with 4+ (Senior) or 6+ (Staff) years of hands-on experience

Proven ability to model and interpret high-dimensional datasets with numerous interacting variables, uncovering statistically robust patterns and causal relationships

Competency in chemistry data science (e.g., interpreting LCMS data, utilizing deconvolution tools, understanding surface chemistry and molecule-target interactions)

Competency in next generation sequencing, including familiarity with multi-omics, error modeling, and basecalling

Expertise in Python and/or R for biostatistical analysis, including data wrangling, statistical modeling, and visualization of high-dimensional experimental results

Experience designing ML models for experimental data and deploying pipelines (Snakemake, Nextflow)

Familiarity with ML frameworks (PyTorch, TensorFlow) and data science libraries (pandas, numpy, scipy)

Experience building automated data pipelines and infrastructure for scalable analysis (cloud, Docker/Kubernetes)

Experience with cloud platforms (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes)

Proficiency with data visualization tools (matplotlib, seaborn, plotly) and Jupyter notebooks

Familiarity with version control (git) and pipeline workflow systems (Snakemake, Nextflow, etc.)

Preferred

Ability to work in performant languages (C++, Rust, Julia, or CUDA)

Ability to develop solutions that optimize the utilization of large-scale data storage, cloud processing infrastructure, and distributed computing

Direct proteomics experience (mass spectrometry, multiplex assays, etc.)

Deep learning experience with time-series data, signal processing, or sequence modeling

Ability to build and deploy scalable ML pipelines using PyTorch/TensorFlow for real-time protein sequence analysis

Experience with MLOps tools and practices for model deployment and monitoring

Experience building commercially successful life science tools that other scientists actually use and love

Previous startup or fast-paced industry (e.g., skunkworks) experience

Benefits

Employee Stock Option Plan

100% Health Plan Coverage for Employees & Dependents (Medical, Dental, & Vision)

Employer Retirement Contributions to 401(k)

Generous Paid Time Off

Paid Maternity and Paternity Leave

Health & Wellbeing Program

Office Snacks and Beverages

Regular Team Bonding Activities

Company

Glyphic Biotechnologies

Glyphic Biotechnologies is a biotechnology company that develops a protein sequencing platform.

Founded in 2021

New York, New York, USA

11-50 employees

https://www.glyphic.bio

Funding

Current Stage

Early Stage

Total Funding

$45.78M

Key Investors

FoundersX VenturesLongeVCNational Institutes of Health

2025-09-26Series A· $38M

2024-11-25Seed

2024-01-25Seed

Leadership Team

Joshua Yang

Co-Founder and CEO

Daniel Estandian

CTO and Co-Founder

Recent News

Google Patent

Amino acid binding agents and their uses

2025-05-04

Google Patent

Methods and systems for processing polymeric analytes

2025-05-04

Google Patent

Single-molecule peptide sequencing using dithioester and thiocarbamoyl amino …

2025-02-08

Company data provided by crunchbase