Amicon Hub Services ยท 1 day ago
Sr. Data Mesh Architect
Maximize your interview chances
Human Resources Services
Insider Connection @Amicon Hub Services
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Design and implement modern data lake architectures using best practices for data organization, partitioning, and storage optimization.
Lead the development of data processing applications using PySpark, optimizing for performance and resource utilization across large-scale Spark clusters.
Design and implement data storage strategies using Apache Iceberg, enabling ACID transactions, time travel, and schema evolution capabilities.
Create robust and scalable data pipelines for batch and streaming workflows, implementing proper error handling and data quality checks.
Tune Spark applications and cluster configurations for optimal performance, implementing partitioning strategies and query optimization techniques.
Establish and maintain data governance frameworks specific to data lake environments, including access controls, audit logging, and compliance measures.
Develop and maintain infrastructure as code for automated deployment of data lake components and Spark clusters.
Collaborate with data engineers, data scientists, and business stakeholders to ensure the data lake architecture meets both technical and business requirements.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Bachelor's or Master's degree in Computer Science, Data Engineering, or related field.
12+ years of experience in data engineering, with at least 3 years focused on data lake architecture and implementations.
Deep expertise in Apache Spark and PySpark, including: Complex ETL development, Performance tuning and optimization, Spark SQL and DataFrame APIs, Spark cluster management and monitoring.
Strong experience with Apache Iceberg, including: Table management and optimization, Schema evolution, Time travel operations, Partition evolution.
Proficiency in Python programming and software engineering best practices.
Experience with cloud platforms (AWS, Azure, or GCP) and their respective data lake services.
Strong understanding of data modeling concepts for both structured and unstructured data.
Experience with version control systems (Git) and CI/CD pipelines.
Preferred
Experience with other modern table formats (Delta Lake, Apache Hudi).
Knowledge of data catalog solutions (AWS Glue, Azure Data Catalog).
Familiarity with data streaming technologies (Kafka, Spark Streaming).
Experience with container orchestration platforms (Kubernetes).
Certification in relevant cloud platforms or data technologies.
Contributions to open-source projects in the data space.
Languages: Python, SQL, Scala
Frameworks: Apache Spark, PySpark
Table Formats: Apache Iceberg
Infrastructure: AWS EMR, Databricks, or equivalent
Version Control: Git, GitHub/GitLab
CI/CD: Jenkins, GitHub Actions, Notebook+Git or equivalent
Monitoring: Grafana, Prometheus, or equivalent
Benefits
Health insurance
Professional development support, including conference attendance and certification programs.
Remote-friendly work environment with flexible scheduling options.
Collaborative culture that values innovation, technical excellence, and continuous learning.
Regular opportunities to contribute to open-source projects and technical community initiatives.
Company
Amicon Hub Services
Amicon Hub is the fastest growing recruitment & Staffing firm, where we provide administrative, professional & Business Services.
Funding
Current Stage
Early StageCompany data provided by crunchbase