Centraprise · 10 hours ago
Data Engineer (PySpark, Python) - Only W2/1099
Centraprise is seeking a Data Engineer with expertise in PySpark and Python. The role involves designing and optimizing data pipelines, collaborating with stakeholders, and leveraging AWS cloud services to manage data workflows.
Responsibilities
Design, develop, and optimise large-scale data pipelines using PySpark and Python
Implement and adhere to best practices in object-oriented programming to build reusable, maintainable code
Write advanced SQL queries for data extraction, transformation, and loading (ETL)
Collaborate closely with data scientists, analysts, and stakeholders to gather requirements and translate them into technical solutions
Troubleshoot data-related issues and resolve them in a timely and accurate manner
Leverage AWS cloud services (e.g., S3, EMR, Lambda, Glue) to build and manage cloud-native data workflows (preferred)
Participate in code reviews, data quality checks, and performance tuning of data jobs
Qualification
Required
3–6 years of relevant experience in a data engineering or backend development role
Strong hands-on experience with PySpark and Python, especially in designing and implementing scalable data transformations
Solid understanding of Object-Oriented Programming (OOP) principles and design patterns
Proficient in SQL, with the ability to write complex queries and optimise performance
Strong problem-solving skills and the ability to troubleshoot complex data issues independently
Excellent communication and collaboration skills
Preferred
Experience working with AWS cloud ecosystem (S3, Glue, EMR, Redshift, Lambda, etc.)
Exposure to data warehousing concepts, distributed computing, and performance tuning
Familiarity with version control systems (e.g., Git), CI/CD pipelines, and Agile methodologies