Apply on Employer Site

People Data Labs · 1 month ago

Senior Software Engineer, Data Acquisition

United States

Full-time

Remote

Senior Level

$160K/yr - $200K/yr

7+ years exp

People Data Labs is a provider of people and company data, focused on building the best data available for innovative solutions. The Senior Software Engineer in Data Acquisition will be responsible for improving data acquisition processes and building scalable data products to support customer needs.

AnalyticsArtificial Intelligence (AI)B2BDatabaseDeveloper APIsMachine LearningSoftware

Comp. & Benefits

H1B Sponsor Likely

Responsibilities

Contribute to the architecture and improvement of our data acquisition and processing platform, increasing reliability, throughput, and observability

Use and develop web crawling technologies to capture and catalog data on the internet

Build, operate, and evolve large-scale distributed systems that collect, process, and deliver data from across the web

Design and develop backend services that manage distributed job orchestration, data pipelines, and large-scale asynchronous workloads

Structure and model captured data, ensuring high quality and consistency across datasets

Continuously improve the speed, scalability, and fault-tolerance of our ingestion systems

Partner with data product and engineering teams to design and implement new data products powered by the data you help collect, and enhance and improve upon existing products

Learn and apply domain-specific knowledge in web crawling and data acquisition, with mentorship from experienced teammates and access to existing systems

Qualification

PythonDistributed systemsData ingestionWeb crawlingSQLCloud platformsCommunicationTechnical documentationProject management

Required

7+ years of professional experience building or operating backend or infrastructure systems at scale

Solid programming experience in Python, Go, Rust, or similar, including experience with async / await, coroutines, or concurrency frameworks

Strong grasp of software architecture and backend fundamentals; you can reason clearly about concurrency, scalability, and fault tolerance

Solid understanding of browser rendering pipeline, web application architecture (auth, cookies, http request / response)

Familiarity with network architecture and debugging (HTTP, DNS, proxies, packet capture and analysis)

Solid understanding of distributed systems concepts: parallelism, asynchronous programming, backpressure, and message-driven design

Experience designing or maintaining resilient data ingestion, API integration, or ETL systems

Proficiency with Linux / Unix command-line tools and system resource management

Familiarity with message queues, orchestration, and distributed task systems (Kafka, SQS, Airflow, etc.)

Experience evaluating and monitoring data quality, ensuring consistency, completeness, and reliability across releases

Work independently in a fast-paced, remote-first environment, proactively unblocking themselves and collaborating asynchronously

Communicate clearly and thoughtfully in writing (Slack, docs, design proposals)

Write and maintain technical design documents, including pipeline design, schema design, and data flow diagrams

Scope and break down complex projects into deliverable milestones, and communicate progress, risks, and blockers effectively

Balance pragmatism with craftsmanship, shipping reliable systems while continuously improving them

Preferred

Degree in a quantitative field such as computer science, mathematics, or engineering

Experience as a Red Teamer

Experience working on large-scale data ingestion, crawling, or indexing systems

Experience with Apache Spark, Databricks, or other distributed data platforms

Experience with streaming data systems (Kafka, Pub/Sub, Spark Streaming, etc.)

Proficiency with SQL and data warehousing (Snowflake, Redshift, BigQuery, or similar)

Experience with cloud platforms (AWS preferred, GCP or Azure also great)

Understanding of modern data storage and design patterns (parquet, Delta Lake, partitioning, incremental updates)

Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)

Experience building and maintaining data pipelines on modern big-data or cloud platforms (Databricks, Spark, or equivalent)

Benefits

Stock

Competitive Salaries

Unlimited paid time off

Medical, dental, & vision insurance

Health, fitness, and office stipends

The permanent ability to work wherever and however you want

Company

People Data Labs

People Data Labs is a software firm offering compliant data and their b2b solutions.

Founded in 2015

San Francisco, California, USA

51-200 employees

https://www.peopledatalabs.com

H1B Sponsorship

People Data Labs has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2024 (2)

2022 (2)

2021 (1)

Funding

Current Stage

Growth Stage

Total Funding

$55.48M

Key Investors

Craft VenturesFounders Fund8VC

2021-11-16Series B· $45M

2018-10-10Series A· $7M

2017-04-26Series A

Leadership Team

Ben Eisenberg

CEO

Dan Shamouilian

Chief Operating Officer

Recent News

TechSpot

Tech firms including Microsoft, Apple, and SpaceX lose senior staff over return-to-office mandates, claims study

2024-06-04

Medium

People Data Labs — Job Title Enrichment API | by World In Data | Mar, 2024

2024-04-06

Fortune

Forcing workers back to the office may be backfiring: Flexible workplaces are hiring talent twice as fast as those requiring full-time attendance

2024-01-25

Company data provided by crunchbase