Codebase Inc · 4 hours ago
GCP Data Engineer
Codebase Inc is seeking a GCP Data Engineer with a strong healthcare background to architect enterprise data platforms on Google Cloud. The role involves designing and building a GCP BigQuery-based Data Lake & Data Warehouse ecosystem and requires deep expertise in data ingestion, transformation, modeling, and governance, particularly with clinical healthcare data standards.
Responsibilities
Architect and design an enterprise-grade GCP-based data lakehouse leveraging BigQuery, GCS, Dataproc, Dataflow, Pub/Sub, Cloud Composer, and BigQuery Omni
Define data ingestion, hydration, curation, processing and enrichment strategies for large-scale structured, semi-structured, and unstructured datasets
Create data domain models, canonical models, and consumption-ready datasets for analytics, AI/ML, and operational data products
Design federated data layers and self-service data products for downstream consumers
Architect batch, near-real-time, and streaming ingestion pipelines using GCP Cloud Dataflow, Pub/Sub, and Dataproc
Set up data ingestion for clinical (EHR/EMR, LIS, RIS/PACS) datasets including HL7, FHIR, CCD, DICOM formats
Build ingestion pipelines for non-clinical systems (ERP, HR, payroll, supply chain, finance)
Architect ingestion from medical devices, IoT, remote patient monitoring, and wearables leveraging IoMT patterns
Manage on-prem → cloud migration pipelines, hybrid cloud data movement, VPN/Interconnect connectivity, and data transfer strategies
Build transformation frameworks using BigQuery SQL, Dataflow, Dataproc, or dbt
Define curation patterns including bronze/silver/gold layers, canonical healthcare entities, and data marts
Implement data enrichment using external social determinants, device signals, clinical event logs, or operational datasets
Enable metadata-driven pipelines for scalable transformations
Establish and operationalize a data governance framework encompassing data stewardship, ownership, classification, and lifecycle policies
Implement data lineage, data cataloging, and metadata management using tools such as Dataplex, Data Catalog, Collibra, or Informatica
Set up data quality frameworks for validation, profiling, anomaly detection, and SLA monitoring
Ensure HIPAA compliance, PHI protection, IAM/RBAC, VPC SC, DLP, encryption, retention, and auditing
Work with cloud infrastructure teams to architect VPC networks, subnetting, ingress/egress, firewall policies, VPN/IPSec, Interconnect, and hybrid connectivity
Define storage layers, partitioning/clustering design, cost optimization, performance tuning, and capacity planning for BigQuery
Understand containerized processing (Cloud Run, GKE) for data services
Work closely with clinical, operational, research, and IT stakeholders to define data use cases, schema, and consumption models
Partner with enterprise architects, security teams, and platform engineering teams on cross-functional initiatives
Guide data engineers and provide architectural oversight on pipeline implementation
Be actively hands-on in building pipelines, writing transformations, building POCs, and validating architectural patterns
Mentor data engineers on best practices, coding standards, and cloud-native development
Qualification
Required
10+ years in data architecture, engineering, or data platform roles
Strong expertise in GCP data stack (BigQuery, Dataflow, Composer, GCS, Pub/Sub, Dataproc, Dataplex)
Hands-on experience with data ingestion, pipeline orchestration, and transformations
Deep understanding of clinical data standards: HL7 v2.x, FHIR, CCD/C-CDA, DICOM (for scans and imaging), LIS/RIS/PACS data structures
Experience with device and IoT data ingestion (wearables, remote patient monitoring, clinical devices)
Experience with ERP datasets (Workday, Oracle, Lawson, PeopleSoft)
Strong SQL and data modeling skills (3NF, star/snowflake, canonical and logical models)
Experience with metadata management, lineage, and governance frameworks
Solid understanding of HIPAA, PHI/PII handling, DLP, IAM, VPC security
Solid understanding of cloud networking, hybrid connectivity, VPC design, firewalling, DNS, service accounts, IAM, and security models
Cloud Native Data movement services
Experience with on-prem to cloud migrations
Company
Codebase Inc
Codebase Inc. is a strategic IT solutions service provider based in New Jersey, USA.
Funding
Current Stage
Growth StageCompany data provided by crunchbase