Capgemini · 16 hours ago
Databricks Data Engineer
Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world. They are seeking an experienced Databricks Data Engineer to lead the migration of existing Pentaho ETL workflows to a modern Databricks-based ETL architecture, focusing on delivering production-ready data pipelines.
ConsultingInformation TechnologyInsurTechIT ManagementSoftware
Responsibilities
Analyze and reverse-engineer existing Pentaho ETL jobs and transformations
Design and implement Databricks ETL pipelines using:
+ Apache Spark (PySpark / Spark SQL)
+ Databricks Workflows / Jobs
+ Delta Lake
Re-platform ETL logic including:
+ Data extraction from relational sources (PostgreSQL, SQL Server, etc.)
+ File-based ingestion (CSV, ZIP, SFTP workflows)
+ Transformations, validations, and enrichment
Implement data loading strategies into Salesforce, including:
+ Salesforce APIs (Bulk API, REST API)
+ Handling large volumes, retries, and error handling
+ Incremental vs full loads
Optimize pipelines for performance, scalability, and cost
Implement logging, monitoring, and data quality checks
Support testing, validation, and parallel runs during migration
Produce technical documentation for migrated pipelines
Collaborate with architects, platform teams, and business stakeholders
Qualification
Required
U.S. Citizenship is required
Eligible to obtain and maintain Government Security Clearance
5+ years of experience in data engineering / ETL development
Strong hands-on experience with Databricks
Proficiency in PySpark and Spark SQL
Prior experience migrating from legacy ETL tools (Pentaho, Informatica, Talend, SSIS, etc.)
Experience integrating with Salesforce as a target system
Solid understanding of: ETL/ELT design patterns, Data modeling and transformation logic, Error handling and restartability
Experience working with relational databases
Strong troubleshooting and problem-solving skills
Preferred
Direct experience migrating Pentaho (Kettle) to Databricks
Experience with Salesforce data models (objects, relationships, limits)
Familiarity with: CI/CD for data pipelines, Git-based source control, Cloud platforms (AWS preferred)
Experience in regulated or compliance-driven environments (FedRAMP, HIPAA, etc.)
Knowledge of data governance, lineage, and auditability
Benefits
Paid time off
Medical/dental/vision insurance
401(k)
Company
Capgemini
Capgemini is a software company that provides consulting, technology, and digital transformation services.
Funding
Current Stage
Public CompanyTotal Funding
$4.72B2025-09-18Post Ipo Debt· $4.72B
1999-04-01IPO
Leadership Team
Recent News
The French Tech Journal
2026-01-22
2026-01-22
Business Wire
2026-01-20
Company data provided by crunchbase