Right Balance ® · 15 hours ago

Lead Data Engineer

United States

Contract

Remote

Senior Level, Lead/Staff

5+ years exp

Maximize your interview chances

ConsultingHuman Resources

Hiring Manager

Simrat Gill

Insider Connection @Right Balance ®

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Design scalable data pipelines processing massive record volumes

Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)

Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch

Integrate new data sources into the main pipeline

Implement advanced data matching using Splink

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Data EngineeringPythonSQLPySparkAWSTech LeadDockerMetabaseAthenaEMRPandasDataFrame manipulationComplex data format handlingBig data processing architecturesData warehouse designPerformance optimizationAdvanced Python skillsAdvanced SQL skillsProbabilistic record linkingOpenSearchMachine learning data pipeline designRecruitment tech ecosystem knowledgeDegree in Computer ScienceEnglish proficiency

Required

Upper-intermediate to fluent speaking and writing English. Able to have a real-time conversation.

5+ years of full-time hands-on Data Engineering experience.

5+ years of full-time hands-on Python experience.

5+ years of full-time hands-on SQL experience.

5+ years of full-time hands-on PySpark experience.

5+ years of full-time hands-on AWS experience.

2+ years of full-time hands-on Tech Lead experience.

2+ years of full-time hands-on Docker experience.

2+ years of full-time hands-on Metabase/ Athena/Glue/EMR experience.

Good proficiency in: PySpark and distributed computing, AWS data services (EMR, Glue, Athena), Docker, Pandas and DataFrame manipulation, Complex data format handling (JSONL, Parquet)

Strong background in: Big data processing architectures, Data warehouse design, Performance optimization, Advanced Python, SQL skills