Databricks Data Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Capgemini · 14 hours ago

Databricks Data Engineer

Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world. They are seeking an experienced Databricks Data Engineer to lead the migration of existing Pentaho ETL workflows to a modern Databricks-based ETL architecture, focusing on delivering production-ready data pipelines.

ConsultingInformation TechnologyInsurTechIT ManagementSoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Analyze and reverse-engineer existing Pentaho ETL jobs and transformations
Design and implement Databricks ETL pipelines using:
+ Apache Spark (PySpark / Spark SQL)
+ Databricks Workflows / Jobs
+ Delta Lake
Re-platform ETL logic including:
+ Data extraction from relational sources (PostgreSQL, SQL Server, etc.)
+ File-based ingestion (CSV, ZIP, SFTP workflows)
+ Transformations, validations, and enrichment
Implement data loading strategies into Salesforce, including:
+ Salesforce APIs (Bulk API, REST API)
+ Handling large volumes, retries, and error handling
+ Incremental vs full loads
Optimize pipelines for performance, scalability, and cost
Implement logging, monitoring, and data quality checks
Support testing, validation, and parallel runs during migration
Produce technical documentation for migrated pipelines
Collaborate with architects, platform teams, and business stakeholders

Qualification

DatabricksPySparkSpark SQLETL developmentSalesforce integrationData modelingRelational databasesTroubleshootingData governanceProblem-solving

Required

U.S. Citizenship is required
Eligible to obtain and maintain Government Security Clearance
5+ years of experience in data engineering / ETL development
Strong hands-on experience with Databricks
Proficiency in PySpark and Spark SQL
Prior experience migrating from legacy ETL tools (Pentaho, Informatica, Talend, SSIS, etc.)
Experience integrating with Salesforce as a target system
Solid understanding of: ETL/ELT design patterns, Data modeling and transformation logic, Error handling and restartability
Experience working with relational databases
Strong troubleshooting and problem-solving skills

Preferred

Direct experience migrating Pentaho (Kettle) to Databricks
Experience with Salesforce data models (objects, relationships, limits)
Familiarity with: CI/CD for data pipelines, Git-based source control, Cloud platforms (AWS preferred)
Experience in regulated or compliance-driven environments (FedRAMP, HIPAA, etc.)
Knowledge of data governance, lineage, and auditability

Benefits

Paid time off
Medical/dental/vision insurance
401(k)

Company

Capgemini

company-logo
Capgemini is a software company that provides consulting, technology, and digital transformation services.

Funding

Current Stage
Public Company
Total Funding
$4.72B
2025-09-18Post Ipo Debt· $4.72B
1999-04-01IPO

Leadership Team

leader-logo
Aiman Ezzat
CEO, Capgemini Group
linkedin
leader-logo
Anirban Bose
CEO of Americas SBU | Member of the Group Executive Board
linkedin
Company data provided by crunchbase