Suncap Technology · 3 months ago
Collibra Data Lineage Automation Engineer / McLean, VA- 6+ months
Suncap Technology is seeking a highly experienced Data Lineage Automation Engineer to lead the design and implementation of automated end-to-end lineage solutions across a highly heterogeneous enterprise data ecosystem. The role requires deep technical expertise in lineage frameworks and a strong AI foundation to support intelligent metadata extraction and traceability.
Human ResourcesRecruitingStaffing Agency
Responsibilities
Lead the implementation of automated data lineage across a complex data estate that includes:
Cloud platforms (e.g., Snowflake, AWS)
Legacy relational databases and ETLs
NoSQL data stores
BI/reporting platforms (e.g., Tableau, Power BI)
Implement or extend frameworks such as Spline, OpenLineage, or similar open frameworks to support active lineage capture
Build connectors, extractors, or agents where necessary to bridge gaps between systems and lineage frameworks
Integrate with metadata platforms (e.g., Collibra) to publish lineage in a consumable format
Apply AI/ML techniques to infer lineage where automation is incomplete (e.g., handling Java based ETLs), using logs, query patterns, or usage metadata
Develop reusable lineage components for operational reuse across domains
Guide stakeholders on best practices for lineage standardization, storage, and use
Qualification
Required
Proven experience delivering automated data lineage solutions across hybrid architectures
Hands-on expertise with Spline, OpenLineage, Marquez, or comparable lineage frameworks
Deep understanding of metadata capture, ETL process tracing, and query execution mapping
Strong AI/ML background — particularly in metadata intelligence, natural language processing for code parsing, or pattern detection
Experience integrating lineage with data governance tools (e.g., Collibra, Alation, etc.)
Strong programming background in Python, Scala, or Java
Deep familiarity with SQL and query logs from systems like Snowflake, SQL Server, Oracle, MongoDB, etc
Preferred
Experience with third-party commercial data lineage solutions a plus (evaluations and implementations)
Prior work in regulated environments (e.g., financial services, healthcare)
Familiarity with event-based architectures for real-time lineage propagation
Knowledge of data mesh or domain-driven lineage strategies