Apryse · 8 hours ago
Full Stack Data Discovery Engineer
Apryse is an industry-leading provider of document software development technology. They are seeking a Full-Stack Data Discovery Engineer to design and implement systems that uncover technology usage across ecosystems, focusing on building data pipelines and dashboards to transform raw data into actionable insights.
Document ManagementSoftwareWeb Development
Responsibilities
Own the full stack: Design, build and optimize scalable data pipelines to discover OSINT and software usage across a wide public ecosystem
Pipeline development: Develop APIs, microservices, crawlers, document fingerprinting to gather data securely and efficiently. Implement backoff/caching, data normalization, and persist to SQL/NoSQL indexes
Data Discovery: Conduct systematic searches across the web, public databases, developer ecosystems and other platforms to identify potential external data repositories relevant to organizational objectives
Metadata and Attribution Analysis: Programmatically uncover and analyze metadata associated with identified data sources to understand data structure, content, quality, and potential use cases
Signals & scoring: develop heuristics/ML‑lite ranking to identify relevant artifacts , deduplicate, and assign confidence scores
Data Governance: Ensure data quality, security, compliance and governance
Productize discovery: build internal tools that let non‑engineers run searches, review candidates, and export leads—fast and safely
Documentation and Reporting: Document data structures, origins (data lineage), and quality issues. Create clear, concise reports and presentations to communicate findings and recommendations to technical and non-technical stakeholders
Collaboration: Work closely with data stewards, data architects, and internal business units to define data requirements and facilitate the integration of new data sources
Innovation and Scale: Continuously explore new data sources, improve attribution logic and propose ML-based enhancements to finding and classifying data
Qualification
Required
Bachelor's degree in Computer Science, Engineering, Library Science, Information Systems, Data Management, or a related field
1-5 years of proven experience as a full-stack developer and data engineer
Back-end: Python, SQL, Java and Node.js
Front-end: Modern JS/TS + React, component libraries, auth patterns, state mgmt
Data & search: schema design, dedup/near‑dup logic, Elasticsearch/OpenSearch; building usable search/triage UIs
Acquisition: Scrapy/Playwright/Puppeteer; API design with rate‑limit/backoff; ethical crawling
Experience with cloud-native architecture and containerization
Familiarity with metadata standards (e.g., Dublin Core, XML) and data management tools
Exceptional attention to detail and strong analytical thinking skills
Excellent written and verbal communication skills, with the ability to translate technical findings into business insights
Strong problem-solving aptitude and the ability to work independently and collaboratively in a fast-paced environment
Preferred
Master's degree in Computer Science, Engineering, Library Science, Information Systems, Data Management, or a related field
Knowledge of data visualization tools (e.g. Power BI, Tableau) to present findings
Experience building internal platforms/tools used by end users or GTM teams
Benefits
A comprehensive extended benefits package including health, dental and vision for you and your family.
401K savings program with company match.
Generous paid time off (PTO) is offered to support the ability to rest and recharge.
Annual recurring WFH allowance for you to purchase items you need for your home office.
Ongoing support for learning development so you can master your craft.
Work with the hardware you're most comfortable with (Windows or Mac).
Diverse and inclusive workplace where we all learn from each other.
Excellent work-life balance with a flexible remote work environment.
Company
Apryse
Apryse is a comprehensive collection of document processing products, that offers superior document solutions for faster, better results.
Funding
Current Stage
Late StageTotal Funding
$71MKey Investors
Thoma BravoSilversmith Capital Partners
2021-05-21Private Equity
2019-05-16Private Equity· $71M
Recent News
Mergers & Acquisitions
2025-07-11
PR Newswire
2025-07-10
pitchbook.com
2025-06-03
Company data provided by crunchbase