SIGN IN
AI Data Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Instrumentl · 3 hours ago

AI Data Engineer

Instrumentl is a mission-driven startup focused on helping the nonprofit sector with grant discovery and management through their SaaS platform. As an AI Data Engineer, you will own the systems that transform unstructured content into structured data, build automated content discovery pipelines, and ensure data quality for product teams.
Non ProfitSaaSSoftware
check
H1B Sponsor Likelynote

Responsibilities

Build content discovery pipelines: Automate discovery and acquisition of grant-related content from the web—foundation websites, RFPs, program announcements—turning the open web into structured, actionable data
Build LLM extraction pipelines: Implement production pipelines to transform unstructured text into canonical business objects—including document ingestion (PDFs, HTML, Word), OCR, table extraction, and layout-aware parsing. Partner with product engineers to evolve schemas as domain needs change
Own semantic chunking and embeddings: Design chunking strategies optimized for retrieval; select and manage embedding models; maintain vector indices that power downstream search and RAG features
Optimize for cost and latency: Profile token usage, implement caching and batching strategies, choose appropriate models for different tasks, and manage the cost/quality tradeoff at scale
Maintain data quality and serve downstream consumers: Implement validation, anomaly detection, and alerting for extraction drift. Expose clean data via APIs, materialized views, or event streams that product teams can rely on without understanding the extraction complexity. Integrate and normalize data from external providers—resolving entities, mapping to internal schemas, and ensuring "Ford Foundation" and "The Ford Foundation" resolve to the same canonical record

Qualification

LLM extraction pipelinesDocument ingestionEmbeddings & vector storesPythonTypeScript/NodeAWS/GCPSQLCollaborative approachResults-driven

Required

5+ years of professional software engineering experience
2+ years working with modern LLMs (as an IC)
Proven production impact: You've taken LLM/RAG systems from prototype to production, owned reliability/observability, and iterated post‑launch based on evals and user feedback
Experience building tool/function‑calling workflows, planning/execution loops, and safe tool integrations (e.g., with LangChain/LangGraph, LlamaIndex, Semantic Kernel, or custom orchestration)
Strong grasp of document ingestion, chunking/windowing, embeddings, hybrid search (keyword + vector), re‑ranking, and grounded citations
Hands‑on with embedding model selection/versioning and vector DBs (e.g., pgvector, FAISS, Pinecone, Weaviate, Milvus, Qdrant)
Comfort designing eval suites (RAG/QA, extraction, summarization), using automated and human‑in‑the‑loop methods; familiarity with frameworks like Ragas/DeepEval/OpenAI Evals or equivalent
Proficiency in Python (FastAPI, Celery) and TypeScript/Node; familiarity with Ruby on Rails (our core platform) or willingness to learn
Experience with AWS/GCP, Docker, CI/CD, and observability (logs/metrics/traces)
Comfortable with SQL, schema design, and building/maintaining data pipelines that power retrieval and evaluation
You thrive in a cross‑functional environment and can translate researchy ideas into shippable, user‑friendly features
Bias for action and ownership with an eye for speed, quality, and simplicity

Preferred

Practical experience with SFT/LoRA or instruction‑tuning (and good intuition for when fine‑tuning vs. prompting vs. model choice is the right lever)
Exposure to open‑source LLMs (e.g., Llama) and providers (e.g., OpenAI, Anthropic, Google, Mistral)
Familiarity with responsible AI, red‑teaming, and domain‑specific safety policies

Benefits

100% covered health, dental, and vision insurance for employees (50% for dependents)
Generous PTO, including parental leave
401(k)
Company laptop and home-office stipend
Bi-Annual Company Retreats for in-person collaboration

Company

Instrumentl

twittertwittertwitter
company-logo
Instrumentl is an end-to-end platform for grant management that supports organizations throughout the entire grant lifecycle.

H1B Sponsorship

Instrumentl has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2024 (4)
2023 (2)
2022 (4)
2021 (6)
2020 (2)

Funding

Current Stage
Growth Stage
Total Funding
$55M
Key Investors
Summit Partners
2025-04-23Private Equity· $55M
2020-01-01Series Unknown
2017-08-01Seed

Leadership Team

leader-logo
Gauri Manglik
CEO & Co-founder
linkedin
leader-logo
Angela Braren
Co-founder leading Engineering, Product & Design
linkedin
Company data provided by crunchbase