Polygon.io · 11 hours ago
Data Acquisition Engineer
Polygon.io is focused on helping developers build the future of fintech by democratizing access to financial market data. The Data Acquisition Engineer will lead the ingestion, parsing, cleaning, and structuring of large external datasets for the financial data platform.
Developer APIsFinancial ExchangesFinTechStock Exchanges
Responsibilities
Own the technical onboarding of external datasets, including parsing raw files, transforming fields, and producing clean structured outputs
Write parsing and transformation logic in Python and SQL to handle diverse file formats (CSV, JSON, XML, HTML, XBRL, PDF, etc.)
Develop reproducible ETL/ELT workflows that clean, normalize, validate, and structure incoming datasets
Manage data storage and processing workflows using S3-compatible object storage systems
Produce efficient, analytics-ready Parquet datasets, using appropriate partitioning and metadata conventions
Implement data-quality checks to detect anomalies, schema drift, missing fields, or unexpected changes in incoming data
Troubleshoot and resolve inconsistencies through systematic, transparent cleaning and transformation rules
Collaborate with internal data and research teams to understand dataset characteristics, quirks, semantics, and intended uses
Provide light technical input during dataset evaluation, offering insight into ingest feasibility and transformation complexity
Write clear documentation describing dataset structure, parsing assumptions, transformation logic, and known limitations
Qualification
Required
Direct experience working with financial datasets at a hedge fund, financial institution, or major financial data provider
Proven track record onboarding or structuring large, complex financial or finance-adjacent datasets, such as market data, fundamentals, regulatory data, reference data, or alternative data
Strong proficiency in Python for parsing, cleaning, and transformation workflows
Strong SQL skills for exploration, validation, and modeling
Hands-on experience working with S3-compatible object storage for large dataset management
Proficiency with Parquet and other columnar storage formats, including partitioning strategies for performance and scale
Experience designing ETL/ELT workflows that are reproducible, maintainable, and resilient to upstream dataset changes
Ability to interpret messy or loosely documented datasets and design stable parsing logic
Clear written communication skills for documenting processes, assumptions, and dataset behavior
Preferred
Experience with DataFusion, DuckDB, or other modern analytical engines is a plus
Exposure to datasets such as regulatory filings, financial reference data, or alternative data is a plus
Experience extracting structured information from PDFs or other irregular data sources is a plus
Familiarity with schema validation, metadata management, or data-quality frameworks is a plus
Company
Polygon.io
Massive empowers participation in the financial markets by providing fair access to market data through a developer-focused platform.
Funding
Current Stage
Growth StageTotal Funding
$6.35MKey Investors
Headline
2020-09-16Series A· $5.75M
2019-10-02Convertible Note· $0.1M
2019-08-12Convertible Note· $0.5M
Recent News
Newsfile
2025-11-15
EIN Presswire
2025-11-03
Company data provided by crunchbase