Right Balance ® · 18 hours ago
Lead Data Engineer
Maximize your interview chances
Insider Connection @Right Balance ®
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Design scalable data pipelines processing massive record volumes
Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)
Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
Integrate new data sources into the main pipeline
Implement advanced data matching using Splink
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Upper-intermediate to fluent speaking and writing English. Able to have a real-time conversation.
5+ years of full-time hands-on Data Engineering experience.
5+ years of full-time hands-on Python experience.
5+ years of full-time hands-on SQL experience.
5+ years of full-time hands-on PySpark experience.
5+ years of full-time hands-on AWS experience.
2+ years of full-time hands-on Tech Lead experience.
2+ years of full-time hands-on Docker experience.
2+ years of full-time hands-on Metabase/ Athena/Glue/EMR experience.
Good proficiency in: PySpark and distributed computing, AWS data services (EMR, Glue, Athena), Docker, Pandas and DataFrame manipulation, Complex data format handling (JSONL, Parquet)
Strong background in: Big data processing architectures, Data warehouse design, Performance optimization, Advanced Python, SQL skills
Preferred
Probabilistic record linking expertise
OpenSearch/elasticsearch technologies
Machine learning data pipeline design
Recruitment tech ecosystem knowledge
Bachelor’s degree in Computer Science or equivalent demonstrated ability.