Apply on Employer Site

Vaniam Group · 3 months ago

Data Engineer, Enterprise Data, Analytics and Innovation

United States

Full-time

Remote

Mid, Senior Level

$110K/yr - $125K/yr

5+ years exp

Vaniam Group is a purpose-driven independent network of healthcare and scientific communications agencies. As a Data Engineer, you will own and evolve the foundation of the data infrastructure, ensuring data reliability, scalability, and accessibility while collaborating with innovation teams to build analytics solutions.

Brand MarketingConsultingOncologyProfessional Services

Responsibilities

Design, build, and operate reliable ETL and ELT pipelines in Python and SQL

Manage ingestion into Bronze, standardization and quality in Silver, and curated serving in Gold layers of our Medallion architecture

Maintain ingestion from transactional MySQL systems into Vaniam Core to keep production data flows seamless

Implement observability, data quality checks, and lineage tracking to ensure trust in all downstream datasets

Develop schemas, tables, and views optimized for analytics, APIs, and product use cases

Apply and enforce best practices for security, privacy, compliance, and access control, ensuring data integrity across sensitive healthcare domains

Maintain clear and consistent documentation for datasets, pipelines, and operating procedures

Lead the integration of third-party datasets, client-provided sources, and new product-generated data into Vaniam Core

Partner with product and innovation teams to build repeatable processes for onboarding new data streams

Ensure harmonization, normalization, and governance across varied data types (scientific, engagement, operational)

Collaborate with the innovation team to prototype and productionize analytics, predictive features, and decision-support tools

Support dashboards, APIs, and services that activate insights for internal stakeholders and clients

Work closely with Data Science and AI colleagues to ensure engineered pipelines meet modeling and deployment requirements

Monitor job execution, storage, and cluster performance, ensuring cost efficiency and uptime

Troubleshoot and resolve data issues, proactively addressing bottlenecks

Conduct code reviews, enforce standards, and contribute to CI/CD practices for data pipelines

Qualification

PythonSQLETL pipelinesData modelingSparkPySparkWorkflow orchestrationData governanceCommunicationProblem-solving mindsetCollaborative approach

Required

5+ years of professional experience in data engineering, ETL, or related roles

Strong proficiency in Python and SQL for data engineering

Hands-on experience building and maintaining pipelines in a lakehouse or modern data platform

Practical understanding of Medallion architectures and layered data design

Familiarity with modern data stack tools, including: Spark or PySpark, Workflow orchestration (Airflow, dbt, or similar), Testing and observability frameworks, Containers (Docker) and Git-based version control

Excellent communication skills, problem-solving mindset, and a collaborative approach