98 applicants

Company

Tredence Inc. · 6 days ago

Data Architect (AWS + Databricks)

United States

Full-time

Remote

Entry, Mid Level

2+ years exp

Maximize your interview chances

AnalyticsArtificial Intelligence (AI)

Growth Opportunities

H1B Sponsor Likely

Insider Connection @Tredence Inc.

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Implement scalable and sustainable data engineering solutions using tools such as Databricks, Azure, Apache Spark, and Python. The data pipelines must be created, maintained, and optimized as workloads move from development to production for specific use cases.

Architecture Design Experience for Cloud and Non-cloud platforms

Expertise with various ETL technologies and familiar with ETL tools

Scrum Management experience with medium and large-scale projects

Experience to whiteboard enterprise level architectures

Must have extensive Range of knowledge of cloud/on premise tools and architectures

Experience to implement large scale hybrid cloud platform & applications

Knowledge of one or more scripting language

Experience with CI/CD

Ability to set and lead the technical vision while balancing business drivers

Ability to understand AWS EMR (Elastic MapReduce).

Execute Change Tasks including but not limited to types Partition tables using partitioning strategy

Creating and optimizing star schema models in a Dedicated SQL Pool

Experience on Data Migration project from AWS EMR to Databricks. security provisioning schema creation DB role creation DDL execution data restoration etc.

Review workload and provide recommendations for performance optimization and operational efficiency

Proactively adjust capacity based on current utilization and upcoming usage guestimate projections

Document and advise developers and end-users about best practices and housekeeping

Work on incidents / onboarding issues related to Pivoting to Cloud from on-prem datastores.

Be able to identify performance bottlenecks Monitor Azure Synapse Analytics using Dynamic Management Views

Build performing data with Table Distribution and Index

Design, develop, and optimize ETL/ELT data pipelines using Databricks on AWS.

Collaborate with data scientists, analysts, and stakeholders to understand business requirements and provide technical solutions.

Implement big data processing solutions using Spark on Databricks.

Manage and maintain Databricks clusters and optimize resource utilization to improve performance.

Develop and maintain CI/CD pipelines for data ingestion and transformation processes.

Integrate Databricks with other AWS services such as S3, Redshift, Glue, Athena, and Lambda.

Implement and manage data lakes using AWS S3 and Delta Lake architecture.

Ensure data quality, governance, and security by implementing best practices.

Monitor, troubleshoot, and optimize Databricks jobs and clusters for performance and cost efficiency.

Automate workflows and support real-time data streaming processes using Kafka, Kinesis, or AWS Glue.

Work with DevOps teams to manage infrastructure as code using tools like Terraform or AWS CloudFormation.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Cloud PlatformsDatabricksApache SparkPythonETL technologiesAWS EMRCI/CDData MigrationData LakesInfrastructure as CodeScrum ManagementSQLScalaData GovernanceData Security

Required

Cloud Platforms (broad knowledge of basic utilities and in-depth knowledge of more than one of the cloud platforms)