Ginkgo Bioworks, Inc. · 1 day ago
Senior Software Engineer, Data Pipelines
Ginkgo Bioworks is dedicated to making biology easier to engineer, focusing on biosecurity infrastructure to address biological threats. The role involves building and operating critical biosecurity data systems, designing reliable data pipelines, and ensuring data quality across various programs.
BiopharmaBiotechnologyChemical
Responsibilities
Plan, architect, test, and deploy data warehouses, data marts, and ETL/ELT pipelines primarily within AWS and Snowflake environments
Build scalable data pipelines capable of handling structured, unstructured, and high-throughput biological data from diverse sources
Develop data models using dbt with rigorous testing, documentation, and stakeholder-aligned semantics to ensure analytics-ready datasets
Ensure data integrity, consistency, and accessibility across internal and external biosecurity data products
Develop, document, and enforce coding and data modeling standards to improve code quality, maintainability, and system performance
Serve as the in-house data expert, making recommendations on data architecture, pipeline improvements, and best practices; define and adapt data engineering processes to deliver reliable answers to critical biosecurity questions
Build high-performance APIs and microservices in Python that enable seamless integration between the biosecurity data platform and user-facing applications
Design backend services that support real-time and batch data access for biosecurity operations
Create data products that empower public health officials, analysts, and partners with actionable biosecurity intelligence
Democratize access to complex biosecurity datasets using AI and LLMs, making data more discoverable and usable for stakeholders
Apply AI-assisted development tools to accelerate code generation, data modeling, and pipeline development while maintaining high quality standards
Build robust, production-ready data workflows using AWS, Kubernetes, Docker, Airflow, and infrastructure-as-code (Terraform/CloudFormation)
Diagnose system bottlenecks, optimize for cost and speed, and ensure the reliability and fault tolerance of mission-critical data pipelines
Implement observability, monitoring, and alerting to maintain high availability for biosecurity operations
Lead data projects from scoping through execution, including design, documentation, and stakeholder communication
Collaborate with technical leads, product managers, scientists, and data analysts to build robust data products and analytics capabilities
Qualification
Required
7+ years of professional experience in data or software engineering, with a focus on building production-grade data products and scalable architectures
Expert proficiency with SQL for complex transformations, performance tuning, and query optimization
Strong Python skills for data engineering workflows, including pipeline development, ETL/ELT processes, and data processing; experience with backend frameworks (FastAPI, Flask) for API development; focus on writing modular, testable, and reusable code
Proven experience with dbt for data modeling and transformation, including testing frameworks and documentation practices
Hands-on experience with cloud data warehouses (Snowflake, BigQuery, or Redshift), including performance tuning, security hardening, and managing complex schemas
Experience with workflow orchestration tools (Airflow, Dagster, or equivalent) for production data pipelines, including DAG development, scheduling, monitoring, and troubleshooting
Solid grounding in software engineering fundamentals: system design, version control (Git), CI/CD pipelines, containerization (Docker), and infrastructure-as-code (Terraform, CloudFormation)
Hands-on experience managing AWS resources, including S3, IAM roles/policies, API integrations, and security configurations
Strong ability to analyze large datasets, identify data quality issues, debug pipeline failures, and propose scalable solutions
Excellent communication skills and ability to work cross-functionally with scientists, analysts, and product teams to turn ambiguous requirements into maintainable data products
Preferred
Domain familiarity with biological data (PCR, sequencing, wastewater surveillance, TAT metrics) and experience working with lab, bioinformatics, NGS, or epidemiology teams
Production ownership of Snowflake environments including RBAC, secure authentication patterns, and cost/performance optimization
Experience with observability and monitoring stacks (Grafana, Datadog, or similar) and data quality monitoring (anomaly detection, volume/velocity checks, schema drift detection)
Familiarity with container orchestration platforms (Kubernetes) for managing production workloads
Experience with data ingestion frameworks (Airbyte, Fivetran) or building custom ingestion solutions for external partner data delivery
Familiarity with data cataloging, governance practices, and reference data management to prevent silent data drift
Experience designing datasets for visualization tools (Tableau, Looker, Metabase) with strong understanding of dashboard consumption patterns; familiarity with JavaScript for custom visualizations or front-end dashboard development
Comfort with AI-assisted development tools (GitHub Copilot, Cursor) to accelerate code generation while maintaining quality standards
Startup or fast-paced environment experience with evolving priorities and rapid iteration
Scientific or data-intensive domain experience (life sciences, healthcare, materials science)
Benefits
Company stock awards
Comprehensive benefits package including medical, dental & vision coverage
Health spending accounts
Voluntary benefits
Leave of absence policies
401(k) program with employer contribution
8 paid holidays in addition to a full-week winter shutdown
Unlimited Paid Time Off policy
Company
Ginkgo Bioworks, Inc.
At Ginkgo, we use biology to grow the future.
H1B Sponsorship
Ginkgo Bioworks, Inc. has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (13)
2024 (38)
2023 (25)
2022 (27)
2021 (27)
2020 (8)
Funding
Current Stage
Public CompanyTotal Funding
$1.58BKey Investors
Bill & Melinda Gates FoundationCenters for Disease Control and PreventionAgriculture and Food Research Initiative
2024-04-10Grant
2023-12-13Grant
2023-10-05Grant
Recent News
Research & Development World
2026-02-06
Investing.com
2026-02-06
2026-02-05
Company data provided by crunchbase