Apply on Employer Site

apiphani · 4 weeks ago

Senior Data Pipeline Engineer

United States

Full-time

Remote

Mid, Senior Level

$45K/yr - $90K/yr

6+ years exp

Apiphani is a technology-enabled managed services company dedicated to redefining support for mission-critical enterprise workloads. The Senior Data Pipeline Engineer will design, develop, and maintain scalable data pipelines on AWS and other cloud platforms while ensuring data quality and reliability for business needs.

ConsultingDatabaseInformation ServicesInformation Technology

H1B Sponsor Likely

Responsibilities

Design, develop, and maintain scalable batch and streaming data pipelines using Apache Spark and cloud-native services (for example AWS Glue, EMR, Kinesis, and Lambda)

Utilize and optimize Apache Spark (RDDs, DataFrames, Spark SQL) for distributed processing of large datasets, including both batch and near real‑time use cases

Implement robust ETL/ELT processes to ingest and transform data from databases, APIs, files, and event streams into curated datasets stored in S3 data lakes, data warehouses (such as Amazon Redshift), and data marts

Implement data quality checks, validation rules, and governance controls (including schema enforcement, profiling, and reconciliation) to ensure accuracy, completeness, and consistency

Develop and maintain logical and physical data models, schemas, and metadata in catalogs to support analytics, BI, and ML consumption

Create and manage data warehouses, data lakes, and data marts on AWS and other cloud platforms (such as Azure or GCP) following modern architectural patterns

Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into scalable pipeline and modeling solutions

Collaborate with DevOps, platform, security, and compliance teams to ensure secure, reliable cloud implementations and adherence to organizational standards

Develop cloud and data architecture documentation, including diagrams, guidelines, and best practices, to enable knowledge sharing and reuse

Troubleshoot and resolve data pipeline and job issues across development and production environments, ensuring minimal downtime and preserving data integrity

Continuously optimize data pipelines for performance, cost, reliability, and data quality using best practices in distributed data engineering and cloud resource tuning

Build algorithms and prototypes that combine and reconcile raw information from multiple sources, including resolving data conflicts and inconsistencies

Provide technical leadership for the analytics data stack, including reviewing designs, establishing standards for observability and reliability, and guiding junior engineers in delivering high-quality solutions

Define and manage data and cloud infrastructure using infrastructure‑as‑code tools such as Terraform (and/or AWS CDK/CloudFormation) to ensure consistent, repeatable environments across development, test, and production

Participate actively in agile ceremonies (backlog refinement, sprint planning, daily stand‑ups, reviews), including estimating and updating user stories, tracking progress, and collaborating closely with data product and analytics stakeholders

Qualification

Apache SparkAWS servicesData pipeline orchestrationData warehousingSQLPythonInfrastructure as codeData governanceAnalytical skillsAgile methodology

Required

Bachelor's degree in Computer Science, Engineering, Mathematics, or related field, or equivalent work experience

6+ years of experience in data engineering or closely related roles, working with large, complex datasets

Demonstrated experience owning production-grade data pipelines end to end, from design and implementation through monitoring, incident response, and continuous improvement

Extensive hands-on experience with Apache Spark for large-scale data processing, including RDDs, DataFrames, and Spark SQL

Familiarity with big data ecosystem components such as HDFS, Hive, and HBase, and their cloud-native equivalents on AWS and other clouds

Experience with SQL and NoSQL databases such as MySQL, PostgreSQL, DynamoDB, or similar technologies

Strong proficiency in SQL and at least one programming language such as Python (preferred) for data processing, automation, and orchestration glue code

Experience with data pipeline orchestration and scheduling tools such as AWS Step Functions, Amazon Managed Workflows for Apache Airflow (MWAA), or Apache Airflow

Experience with cloud-based data platforms and services, ideally AWS (S3, Glue, EMR, Redshift, Kinesis, Lambda), with exposure to Azure or GCP as a plus

Experience designing and implementing data warehouses and data lakes, including partitioning, file formats, and performance optimization

Experience with data quality, automated data testing, and data governance methodologies and tools; familiarity with lineage, cataloging, and access controls

Strong analytical and problem-solving skills, high attention to detail, and clear written and verbal communication

Ability to work independently and collaboratively in a fast-paced, agile, and cross-functional environment

Hands‑on experience with infrastructure as code, preferably Terraform (and/or AWS CDK/CloudFormation), to provision and manage data and cloud resources

Practical experience working in an agile delivery model, including breaking down work into user stories, sizing and updating them during the sprint, and delivering incrementally

Preferred

Strong proficiency in SQL and at least one programming language such as Python (preferred) for data processing, automation, and orchestration glue code

Experience working with a modern data catalog such as Alation, Collibra, or similar tools is a plus

Ability to prepare and curate data for prescriptive and predictive modeling (for example, features for ML models) is a plus

Benefits

Medical/dental/vision - 100% paid for employees, 50% paid for dependents

Life and disability - 100% paid for employees

401K - 3% contribution, no employee contribution necessary

Education and tuition reimbursement - up to $50K annually

Employee Stock Options Plan

Accident, critical illness, hospital indemnity benefits offered through our providers

Employee Assistance Program

Legal assistance

Paid Time Off - up to 6 weeks per year

Sick Leave - up to 2 weeks per year

Parental Leave - up to 12 weeks

Company

apiphani

apiphani offers strategic & operational consulting, database management, and managed services.

Founded in 2018

Boston, Massachusetts, USA

51-200 employees

https://www.apiphani.io

H1B Sponsorship

apiphani has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2024 (5)

2023 (1)