Apply on Employer Site

The Custom Group of Companies · 2 days ago

Data Engineer - III

United States

Contract

Remote

Senior Level

The Custom Group of Companies is seeking a Senior Data Engineer to migrate applications to cloud service providers and develop products using the latest technologies. The role involves building cloud infrastructure, automating processes, and collaborating with business stakeholders to achieve goals through agile methodologies.

ConsultingHuman ResourcesLegalStaffing Agency

Responsibilities

Work on migrating applications from an on-premises location to the cloud service providers

Develop products and services on the latest technologies through contributions in development, enhancements, testing and implementation

Develop, modify, extend code for building cloud infrastructure, and automate using CI/CD pipeline

Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology

Perform problem analysis, data analysis, reporting, and communication

Work with peers across the system to define and implement best practices and standards

Assess applications and help determine the appropriate application infrastructure patterns

Use the best practices and knowledge of internal or external drivers to improve products or services

Hands-on experience in building ETL using Databricks SaaS infrastructure

Experience in developing data pipeline solutions to ingest and exploit new and existing data sources

Expertise in leveraging SQL, programming language like Python and ETL tools like Databricks

Perform code reviews to ensure requirements, optimal execution patterns and adherence to established standards

Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue)

Advanced understanding of Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services

Good understanding of AWS Identify and Access management, AWS Networking and AWS Monitoring tools

Proficiency in CI/CD and deployment automation using GITLAB pipeline

Proficiency in Cloud infrastructure provisioning tools e.g., Terraform

Proficiency in one or more programming languages e.g., Python, Scala

Experience in Starburst, Trino and building SQL queries in federated architecture

Good knowledge of Lake house architecture

Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala)

Build data ingestion workflows from various sources (structured, semi-structured, and unstructured)

Develop reusable components and frameworks for efficient data processing

Implement best practices for data quality, validation, and governance

Collaborate with data architects, analysts, and business stakeholders to understand data requirements

Tune Spark jobs for performance and scalability in a cloud-based environment

Maintain robust data lake or Lakehouse architecture

Ensure high availability, security, and integrity of data pipelines and platforms

Support troubleshooting, debugging, and performance optimization in production workloads

Qualification

ETL with DatabricksAWS Compute EC2AWS Compute EMRAWS Storage S3AWS Storage EBSAWS Databases RDSAWS Databases DynamoDBSQLPythonCI/CD with GITLABTerraformDockerKubernetesData quality governanceData pipeline solutionsPerformance optimization

Required

Hands-on experience in building ETL using Databricks SaaS infrastructure

Experience in developing data pipeline solutions to ingest and exploit new and existing data sources

Expertise in leveraging SQL, programming language like Python and ETL tools like Databricks

Perform code reviews to ensure requirements, optimal execution patterns and adherence to established standards

Computer Science or Equivalent

Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue)

Advanced understanding of Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services

Good understanding of AWS Identify and Access management, AWS Networking and AWS Monitoring tools

Proficiency in CI/CD and deployment automation using GITLAB pipeline

Proficiency in Cloud infrastructure provisioning tools e.g., Terraform

Proficiency in one or more programming languages e.g., Python, Scala

Experience in Starburst, Trino and building SQL queries in federated architecture

Good knowledge of Lake house architecture

Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala)

Build data ingestion workflows from various sources (structured, semi-structured, and unstructured)

Develop reusable components and frameworks for efficient data processing

Implement best practices for data quality, validation, and governance

Collaborate with data architects, analysts, and business stakeholders to understand data requirements

Tune Spark jobs for performance and scalability in a cloud-based environment

Maintain robust data lake or Lakehouse architecture

Ensure high availability, security, and integrity of data pipelines and platforms

Support troubleshooting, debugging, and performance optimization in production workloads

Company

The Custom Group of Companies

For over 30 years, The Custom Group of Companies has been a leader in the recruitment industry, providing temporary/consulting, direct hire, and executive search services throughout New York.

Founded in 1986

New York, New York, USA

51-200 employees

http://customgroupofcompanies.com

Funding

Current Stage

Growth Stage

Leadership Team

Andrew Norton

Managing Director/Partner

Company data provided by crunchbase