Phaxis · 2 months ago
Data Engineer with Python & Databricks
Phaxis is a company focused on data solutions, and they are seeking a Data Engineer with expertise in Python and Databricks. The role involves building efficient data pipelines, implementing best practices in data engineering, and collaborating with teams to deliver optimized data solutions for both batch and real-time processing.
DeliveryHuman ResourcesStaffing Agency
Responsibilities
Design, develop, and maintain end-to-end data pipelines and ETL/ELT workflows using Python, SQL, and modern orchestration tools (e.g., Airflow, dbt)
Architect and optimize data storage solutions, including data lakes and data warehouses, using Databricks, Delta Lake, and cloud-native services
Build scalable data processing solutions leveraging Databricks notebooks, jobs, and clusters for both batch and streaming data workloads
Develop and manage Databricks workflows using Spark (PySpark, SQL, or Scala) to transform, cleanse, and aggregate large datasets
Implement data quality checks, schema validation, and monitoring to ensure data accuracy and reliability
Optimize Databricks cluster configurations and job performance to minimize cost and maximize throughput
Collaborate with DevOps teams to automate deployments, CI/CD pipelines, and infrastructure-as-code (IaC) for data systems
Qualification
Required
Bachelor's or Master's degree in Computer Science, Data Engineering, or a related technical field
Advanced proficiency in Python and SQL for data manipulation, transformation, and automation
Deep experience with Databricks, including Spark optimization, Delta Lake management, job orchestration, and workspace administration
Strong understanding of distributed data processing, partitioning strategies, and performance tuning in Databricks and Spark
Hands-on experience with cloud platforms (AWS, Azure, or GCP) and their data ecosystems (e.g., S3, ADLS, BigQuery, Snowflake)