Senior Software Engineer, Data Acquisition jobs in United States
cer-icon
Apply on Employer Site
company-logo

People Data Labs · 1 month ago

Senior Software Engineer, Data Acquisition

People Data Labs is a provider of people and company data, focused on building the best data available for innovative solutions. The Senior Software Engineer in Data Acquisition will be responsible for improving data acquisition processes and building scalable data products to support customer needs.

AnalyticsArtificial Intelligence (AI)B2BDatabaseDeveloper APIsMachine LearningSoftware
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Contribute to the architecture and improvement of our data acquisition and processing platform, increasing reliability, throughput, and observability
Use and develop web crawling technologies to capture and catalog data on the internet
Build, operate, and evolve large-scale distributed systems that collect, process, and deliver data from across the web
Design and develop backend services that manage distributed job orchestration, data pipelines, and large-scale asynchronous workloads
Structure and model captured data, ensuring high quality and consistency across datasets
Continuously improve the speed, scalability, and fault-tolerance of our ingestion systems
Partner with data product and engineering teams to design and implement new data products powered by the data you help collect, and enhance and improve upon existing products
Learn and apply domain-specific knowledge in web crawling and data acquisition, with mentorship from experienced teammates and access to existing systems

Qualification

PythonDistributed systemsData ingestionWeb crawlingSQLCloud platformsCommunicationTechnical documentationProject management

Required

7+ years of professional experience building or operating backend or infrastructure systems at scale
Solid programming experience in Python, Go, Rust, or similar, including experience with async / await, coroutines, or concurrency frameworks
Strong grasp of software architecture and backend fundamentals; you can reason clearly about concurrency, scalability, and fault tolerance
Solid understanding of browser rendering pipeline, web application architecture (auth, cookies, http request / response)
Familiarity with network architecture and debugging (HTTP, DNS, proxies, packet capture and analysis)
Solid understanding of distributed systems concepts: parallelism, asynchronous programming, backpressure, and message-driven design
Experience designing or maintaining resilient data ingestion, API integration, or ETL systems
Proficiency with Linux / Unix command-line tools and system resource management
Familiarity with message queues, orchestration, and distributed task systems (Kafka, SQS, Airflow, etc.)
Experience evaluating and monitoring data quality, ensuring consistency, completeness, and reliability across releases
Work independently in a fast-paced, remote-first environment, proactively unblocking themselves and collaborating asynchronously
Communicate clearly and thoughtfully in writing (Slack, docs, design proposals)
Write and maintain technical design documents, including pipeline design, schema design, and data flow diagrams
Scope and break down complex projects into deliverable milestones, and communicate progress, risks, and blockers effectively
Balance pragmatism with craftsmanship, shipping reliable systems while continuously improving them

Preferred

Degree in a quantitative field such as computer science, mathematics, or engineering
Experience as a Red Teamer
Experience working on large-scale data ingestion, crawling, or indexing systems
Experience with Apache Spark, Databricks, or other distributed data platforms
Experience with streaming data systems (Kafka, Pub/Sub, Spark Streaming, etc.)
Proficiency with SQL and data warehousing (Snowflake, Redshift, BigQuery, or similar)
Experience with cloud platforms (AWS preferred, GCP or Azure also great)
Understanding of modern data storage and design patterns (parquet, Delta Lake, partitioning, incremental updates)
Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)
Experience building and maintaining data pipelines on modern big-data or cloud platforms (Databricks, Spark, or equivalent)

Benefits

Stock
Competitive Salaries
Unlimited paid time off
Medical, dental, & vision insurance
Health, fitness, and office stipends
The permanent ability to work wherever and however you want

Company

People Data Labs

twittertwittertwitter
company-logo
People Data Labs is a software firm offering compliant data and their b2b solutions.

H1B Sponsorship

People Data Labs has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (2)
2022 (2)
2021 (1)

Funding

Current Stage
Growth Stage
Total Funding
$55.48M
Key Investors
Craft VenturesFounders Fund8VC
2021-11-16Series B· $45M
2018-10-10Series A· $7M
2017-04-26Series A

Leadership Team

leader-logo
Ben Eisenberg
CEO
linkedin
leader-logo
Dan Shamouilian
Chief Operating Officer
linkedin
Company data provided by crunchbase