FUSTIS LLC · 8 hours ago
Lead AWS Data Engineer (PySpark, Glue & Dimensional Modelling)
FUSTIS LLC is seeking a Lead AWS Data Engineer to develop and maintain PySpark-based ETL pipelines and manage AWS Glue jobs. The role involves designing dimensional data models and optimizing data workflows to ensure data reliability and performance.
Responsibilities
Develop and maintain PySpark-based ETL pipelines for batch and incremental data processing
Build and operate AWS Glue Spark jobs (batch and event-driven), including:
Job configuration, scaling, retries, and cost optimization
Glue Catalog and schema management
Design and maintain event-driven data workflows triggered by S3, EventBridge, or streaming sources
Load and transform data into Amazon Redshift , optimizing for:
Distribution and sort keys
Incremental loads and upserts
Query performance and concurrency
Design and implement dimensional data models (star/snowflake schemas), including:
Fact and dimension tables
Slowly Changing Dimensions (SCDs)
Grain definition and data quality controls
Collaborate with analytics and reporting teams to ensure the warehouse is BI-ready
Monitor, troubleshoot, and optimize data pipelines for reliability and performance
Qualification
Required
PySpark ramp-up
Glue job hands-on proof
Dimensional modeling
Strong PySpark experience (Spark SQL, DataFrames, performance tuning)
Hands-on experience with AWS Glue (Spark jobs, not just crawlers)
Experience loading and optimizing data in Amazon Redshift
Proven experience designing dimensional data warehouse schemas
Familiarity with AWS-native data services (S3, IAM, CloudWatch)
Production ownership mindset (debugging, failures, reprocessing)