200+ applicants

Company

Original Job Post

Sparibis · 6 days ago

Senior Data Engineer

United States

Full-time

Remote

Senior Level

10+ years exp

Wonder how qualified you are to the job?

Maximize your interview chances

Cyber SecurityData Management

Insider Connection @Sparibis

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Plan, create, and maintain data architectures, ensuring alignment with business requirements

Obtain data, formulate dataset processes, and store optimized data

Identify problems and inefficiencies and apply solutions

Determine tasks where manual participation can be eliminated with automation.

Identify and optimize data bottlenecks, leveraging automation where possible

Create and manage data lifecycle policies (retention, backups/restore, etc)

In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines

Create, maintain, and manage data transformations

Maintain/update documentation

Create, maintain, and manage data pipeline schedules

Monitor data pipelines

Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality

Support AI/ML teams with optimizing feature engineering code

Expertise in Spark/Python/Databricks, Data Lake and SQL

Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT

Research existing data in the data lake to determine best sources for data

Create, manage, and maintain ksqlDB and Kafka Streams queries/code

Data driven testing for data quality

Maintain and update Python-based data processing scripts executed on AWS Lambdas

Unit tests for all the Spark, Python data processing and Lambda codes

Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc)

Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Enterprise Data ArchitectureData ModelingData Quality ValidationETL/ELT ToolsSQLAWS EnvironmentCI/CD PipelinesPythonSparkData Lake ConceptsSystem IntegrationData MigrationData WarehouseData MartStreaming Data PipelinesBatch SystemsIndexingPartitioning StrategyDebuggingTroubleshootingBig Data Application DeploymentWorkflow DefinitionGreat ExpectationsData QualityContainerizationPipeline OrchestrationAirflowPrefectAWS ArchitectureKinesis

Required

10+ years of IT experience focusing on enterprise data architecture and management

Bachelor’s in IT related field

Must show that applicant is legally permitted to work in the United States

Applicants must be able to meet the requirements to obtain a Public Trust security clearance. NOTE: United States Citizenship is required to be eligible to obtain this security clearance

Experience with Databricks required

8+ years experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling

Experience with Great Expectations or other data quality validation frameworks

Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services

Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)

Experience with AWS environment, CI/CD pipelines, and Python (Python 3) a bonus

Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling

Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required

Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark

Data Lake concepts such as time travel and schema evolution and optimization

Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support

Advanced level understanding of streaming data pipelines and how they differ from batch systems

Formalize concepts of how to handle late data, defining windows, and data freshness

Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc

Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)

Indexing and partitioning strategy experience

Debug, troubleshoot, design and implement solutions to complex technical issues

Experience with large-scale, high-performance enterprise big data application deployment and solution

Understanding how to create DAGs to define workflows

Preferred

Experience with Great Expectations or other data quality/data validation frameworks a bonus

Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus

Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required

Architecture experience in AWS environment a bonus

Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus

Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus

Familiarity with Pytest and Unittest a bonus

Experience working with JSON and defining JSON Schemas a bonus

Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus

Familiarity with Schema Registry, message formats such as Avro, ORC, etc.

Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams

Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required

Architecture experience in AWS environment a bonus

Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus

Experience with Docker, Jenkins, and CloudWatch

Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines

Experience working with AWS Lambdas for configuration and optimization

Experience working with DynamoDB to query and write data

Experience with S3

Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management