Data Acquisition Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Polygon.io · 11 hours ago

Data Acquisition Engineer

Polygon.io is focused on helping developers build the future of fintech by democratizing access to financial market data. The Data Acquisition Engineer will lead the ingestion, parsing, cleaning, and structuring of large external datasets for the financial data platform.

Developer APIsFinancial ExchangesFinTechStock Exchanges

Responsibilities

Own the technical onboarding of external datasets, including parsing raw files, transforming fields, and producing clean structured outputs
Write parsing and transformation logic in Python and SQL to handle diverse file formats (CSV, JSON, XML, HTML, XBRL, PDF, etc.)
Develop reproducible ETL/ELT workflows that clean, normalize, validate, and structure incoming datasets
Manage data storage and processing workflows using S3-compatible object storage systems
Produce efficient, analytics-ready Parquet datasets, using appropriate partitioning and metadata conventions
Implement data-quality checks to detect anomalies, schema drift, missing fields, or unexpected changes in incoming data
Troubleshoot and resolve inconsistencies through systematic, transparent cleaning and transformation rules
Collaborate with internal data and research teams to understand dataset characteristics, quirks, semantics, and intended uses
Provide light technical input during dataset evaluation, offering insight into ingest feasibility and transformation complexity
Write clear documentation describing dataset structure, parsing assumptions, transformation logic, and known limitations

Qualification

Financial datasets experiencePythonSQLS3-compatible storageETL/ELT workflowsParquetData documentationDataFusionDuckDBSchema validationData-quality frameworks

Required

Direct experience working with financial datasets at a hedge fund, financial institution, or major financial data provider
Proven track record onboarding or structuring large, complex financial or finance-adjacent datasets, such as market data, fundamentals, regulatory data, reference data, or alternative data
Strong proficiency in Python for parsing, cleaning, and transformation workflows
Strong SQL skills for exploration, validation, and modeling
Hands-on experience working with S3-compatible object storage for large dataset management
Proficiency with Parquet and other columnar storage formats, including partitioning strategies for performance and scale
Experience designing ETL/ELT workflows that are reproducible, maintainable, and resilient to upstream dataset changes
Ability to interpret messy or loosely documented datasets and design stable parsing logic
Clear written communication skills for documenting processes, assumptions, and dataset behavior

Preferred

Experience with DataFusion, DuckDB, or other modern analytical engines is a plus
Exposure to datasets such as regulatory filings, financial reference data, or alternative data is a plus
Experience extracting structured information from PDFs or other irregular data sources is a plus
Familiarity with schema validation, metadata management, or data-quality frameworks is a plus

Company

Polygon.io

twittertwittertwitter
company-logo
Massive empowers participation in the financial markets by providing fair access to market data through a developer-focused platform.

Funding

Current Stage
Growth Stage
Total Funding
$6.35M
Key Investors
Headline
2020-09-16Series A· $5.75M
2019-10-02Convertible Note· $0.1M
2019-08-12Convertible Note· $0.5M

Leadership Team

leader-logo
Quinton Pike
Founder
linkedin
Company data provided by crunchbase