Tough Leaf · 18 hours ago
Senior/Staff Data Engineer
Tough Leaf is a company focused on helping general contractors and agencies connect with certified subcontractors using advanced data solutions. They are looking for a Senior or Staff Data Engineer to own the full data lifecycle, ensuring data quality and reliability while leveraging AI technologies.
Responsibilities
Own ingestion & normalization for messy, multi-source subcontractor data
Build and harden enrichment pipelines (contacts, websites, capabilities), including automated refresh + backfills
Design and ship deduplication & entity resolution that prevents match corruption and “ghost firms”
Create data quality gates: validation rules, monitors, alerts, and safe rollouts that reduce manual QA
Improve how data is surfaced for matching & search (ranking signals, relevance, usability)
Productize enrichment so outcomes are repeatable, measurable, and scalable—not one-off magic
Use AI coding agents daily, but hold a high bar for correctness, testing, and review
Qualification
Required
You've owned production data systems end-to-end, not just built one-off pipelines
You think deeply about data quality, invariants, and failure modes
You've shipped deduplication, fuzzy matching, entity resolution, or golden-record systems
You're comfortable with schema drift, inconsistent naming, partial truth, and ambiguity
You have strong code review judgment, especially for logic and correctness
You're AI-native: you use coding agents daily—but you verify, test, and refactor ruthlessly
Preferred
Web scraping & crawling experience (especially resilient refresh systems)
Search & matching experience (ranking, relevance, retrieval systems)
Productized enrichment flows (website crawling, LLM cleanup/structuring, map data)
Startup experience where autonomy, speed, and ownership actually matter
Benefits
Health + Dental + Vision coverage
Company
Tough Leaf
Tough Leaf is a platform that helps construction companies manage pre-construction processes.
Funding
Current Stage
Early StageTotal Funding
$7.79M2024-06-28Series A· $4.5M
2022-07-20Seed· $3.1M
2021-12-03Pre Seed· $0.18M
Recent News
Company data provided by crunchbase