Direct Recruit Agency, LLC ยท 3 months ago
7190-A - Data Catalog Developer
Direct Recruit Agency, LLC is seeking a highly skilled and experienced Data Catalog Developer to join their team. The role involves designing, developing, and maintaining data catalogs for clients, with a strong focus on Collibra Data Management and integration with various data sources.
Staffing & Recruiting
Responsibilities
Expertise in Collibra is a must
Will be building Collibra Data Catalog
Experience in the new Collibra software - Edge
Development of Data Catalog, build Collibra workflows, and Integrate Edge server with various data sources, authentication, and access controls
Data Catalog build out, Metadata Synchronization, Lineage Harvester
Work on migrating applications from an on-premises location to the cloud service providers
Develop products and services on the latest technologies through contributions in development, enhancements, testing, and implementation
Develop, modify, and extend code for building cloud infrastructure, and automate using CI/CD pipeline
Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology
Perform problem analysis, data analysis, reporting, and communication
Work with peers across the system to define and implement best practices and standards
Assess applications and help determine the appropriate application infrastructure patterns
Use the best practices and knowledge of internal or external drivers to improve products or services
Qualification
Required
Bachelor's degree in Computer Science, Information Systems, or a related field
Minimum of 3 years of experience as a Data Catalog Developer or in a similar role
Hands-on experience in building ETL using Databricks SaaS infrastructure
Experience in developing data pipeline solutions to ingest and exploit new and existing data sources
Expertise in leveraging SQL, programming languages like Python, and ETL tools like Databricks
Perform code reviews to ensure requirements, optimal execution patterns, and adherence to established standards
Computer Science or Equivalent
Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue)
Advanced understanding of Container Orchestration services, including Docker and Kubernetes, and a variety of AWS tools and services
Good understanding of AWS Identity and Access Management, AWS Networking, and AWS Monitoring tools
Proficiency in CI/CD and deployment automation using GITLAB pipeline
Proficiency in Cloud infrastructure provisioning tools, e.g., Terraform
Proficiency in one or more programming languages, e.g., Python, Scala
Experience in Starburst, Trino, and building SQL queries in a federated architecture
Good knowledge of Lake house architecture
Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala)
Build data ingestion workflows from various sources (structured, semi-structured, and unstructured)
Develop reusable components and frameworks for efficient data processing
Implement best practices for data quality, validation, and governance
Collaborate with data architects, analysts, and business stakeholders to understand data requirements
Tune Spark jobs for performance and scalability in a cloud-based environment
Maintain robust data lake or Lakehouse architecture
Ensure high availability, security, and integrity of data pipelines and platforms
Support troubleshooting, debugging, and performance optimization in production workloads
Preferred
Collibra Ranger certification
Databricks
AWS (S3, Glue, Aurora Postgres, Athena)
Communication
Problem-Solving
Collaboration
Attention to Detail
Company
Direct Recruit Agency, LLC
We find your dream job and next employee with Direct Recruit Agency. We are the 1st responder of recruiting for your career with us.
Funding
Current Stage
Early StageCompany data provided by crunchbase