Full Stack Data Discovery Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Apryse · 8 hours ago

Full Stack Data Discovery Engineer

Apryse is an industry-leading provider of document software development technology. They are seeking a Full-Stack Data Discovery Engineer to design and implement systems that uncover technology usage across ecosystems, focusing on building data pipelines and dashboards to transform raw data into actionable insights.

Document ManagementSoftwareWeb Development

Responsibilities

Own the full stack: Design, build and optimize scalable data pipelines to discover OSINT and software usage across a wide public ecosystem
Pipeline development: Develop APIs, microservices, crawlers, document fingerprinting to gather data securely and efficiently. Implement backoff/caching, data normalization, and persist to SQL/NoSQL indexes
Data Discovery: Conduct systematic searches across the web, public databases, developer ecosystems and other platforms to identify potential external data repositories relevant to organizational objectives
Metadata and Attribution Analysis: Programmatically uncover and analyze metadata associated with identified data sources to understand data structure, content, quality, and potential use cases
Signals & scoring: develop heuristics/ML‑lite ranking to identify relevant artifacts , deduplicate, and assign confidence scores
Data Governance: Ensure data quality, security, compliance and governance
Productize discovery: build internal tools that let non‑engineers run searches, review candidates, and export leads—fast and safely
Documentation and Reporting: Document data structures, origins (data lineage), and quality issues. Create clear, concise reports and presentations to communicate findings and recommendations to technical and non-technical stakeholders
Collaboration: Work closely with data stewards, data architects, and internal business units to define data requirements and facilitate the integration of new data sources
Innovation and Scale: Continuously explore new data sources, improve attribution logic and propose ML-based enhancements to finding and classifying data

Qualification

PythonSQLJavaNode.jsReactElasticsearchScrapyAPI designCloud-native architectureAnalytical thinkingAttention to detailCommunication skillsProblem-solving

Required

Bachelor's degree in Computer Science, Engineering, Library Science, Information Systems, Data Management, or a related field
1-5 years of proven experience as a full-stack developer and data engineer
Back-end: Python, SQL, Java and Node.js
Front-end: Modern JS/TS + React, component libraries, auth patterns, state mgmt
Data & search: schema design, dedup/near‑dup logic, Elasticsearch/OpenSearch; building usable search/triage UIs
Acquisition: Scrapy/Playwright/Puppeteer; API design with rate‑limit/backoff; ethical crawling
Experience with cloud-native architecture and containerization
Familiarity with metadata standards (e.g., Dublin Core, XML) and data management tools
Exceptional attention to detail and strong analytical thinking skills
Excellent written and verbal communication skills, with the ability to translate technical findings into business insights
Strong problem-solving aptitude and the ability to work independently and collaboratively in a fast-paced environment

Preferred

Master's degree in Computer Science, Engineering, Library Science, Information Systems, Data Management, or a related field
Knowledge of data visualization tools (e.g. Power BI, Tableau) to present findings
Experience building internal platforms/tools used by end users or GTM teams

Benefits

A comprehensive extended benefits package including health, dental and vision for you and your family.
401K savings program with company match.
Generous paid time off (PTO) is offered to support the ability to rest and recharge.
Annual recurring WFH allowance for you to purchase items you need for your home office.
Ongoing support for learning development so you can master your craft.
Work with the hardware you're most comfortable with (Windows or Mac).
Diverse and inclusive workplace where we all learn from each other.
Excellent work-life balance with a flexible remote work environment.

Company

Apryse is a comprehensive collection of document processing products, that offers superior document solutions for faster, better results.

Funding

Current Stage
Late Stage
Total Funding
$71M
Key Investors
Thoma BravoSilversmith Capital Partners
2021-05-21Private Equity
2019-05-16Private Equity· $71M

Leadership Team

leader-logo
Catherine Andersz
Board Member
linkedin

Recent News

Company data provided by crunchbase