Staff AI Engineer, Data Systems jobs in United States
cer-icon
Apply on Employer Site
company-logo

MagicSchool AI · 2 months ago

Staff AI Engineer, Data Systems

MagicSchool AI is the premier generative AI platform for teachers, aiming to make a real social impact in education. The Staff AI Engineer will architect information infrastructure for AI agents, focusing on knowledge organization, retrieval, and memory systems to enhance educational content access and reasoning for millions of educators.

Artificial Intelligence (AI)E-LearningEdTechEducation

Responsibilities

Architect and implement graph-based knowledge systems (Neo4j, Neptune, etc) that represent educational content relationships, standards alignments, prerequisite chains, curriculum coherence, learning progressions, and pedagogical connections
Design and evolve ontologies and schemas for educational content, defining entity types (standards, concepts, skills, assessments), relationship semantics, and property models that support both human comprehension and AI reasoning
Build GraphRAG systems that combine knowledge graph traversal with vector similarity, enabling agents to retrieve not just similar content but contextually connected educational materials through semantic and structural relationships
Architect and implement sophisticated retrieval-augmented generation pipelines including hybrid search (dense + sparse), multi-stage retrieval, reranking strategies, and query understanding that surface the most relevant educational content for agent reasoning
Design and operationalize embedding pipelines for educational content, selecting and fine-tuning embedding models, implementing chunking strategies appropriate for curriculum materials, and managing vector stores at scale for fast, accurate retrieval
Design evaluation pipelines that measure retrieval precision, recall, MRR, and NDCG across educational content types. Continuously optimize retrieval quality through experimentation with embedding models, chunking strategies, and ranking algorithms
Build robust ingestion systems that process structured (standards documents, curriculum frameworks, JSON) and unstructured (PDFs, lesson plans, textbooks) educational content, extracting entities, relationships, and metadata for knowledge base population
Implement NLP pipelines for educational content that extract key concepts, prerequisite relationships, learning objectives, and pedagogical metadata, enriching raw content with structured annotations for improved retrieval and reasoning
Invent and operationalize memory compaction mechanisms, session state management, and cross-conversation memory patterns that allow agents to maintain coherence across extended teaching workflows while respecting token budgets
Design evaluation frameworks that measure retrieval precision, token relevance, attention allocation, and reasoning coherence as context evolves across sessions. Work with the evaluations team on detecting context degradation and retrieval failures
Partner with Product, Research, and Educators to understand content relationships, retrieval requirements, and context needs across different teaching scenarios, translating domain expertise into technical architecture
Collaborate with ML researchers / evaluations team and context engineers to co-design architectures that integrate knowledge graphs, vector stores, and retrieval systems with agent runtimes and LLM inference pipelines
Guide engineers on knowledge graph design, RAG architecture patterns, embedding strategies, and retrieval optimization, elevating the team's capability in building knowledge-intensive AI systems

Qualification

Knowledge Graph DesignRAG Systems ExpertiseGraph Database ExpertiseEmbedding & NLP BackgroundInformation ArchitectureTechnical StackEducational Context AwarenessTechnical MentorshipCross-Functional Collaboration

Required

5+ years building large-scale information systems with at least 2+ years in staff/senior roles
Extensive hands-on experience with RAG systems, knowledge graphs, or semantic search platforms in production environments
Deep experience with graph databases (Neo4j, Neptune, or similar), including schema design, query optimization (Cypher, Gremlin), and building graph-based applications
Understanding of when graph structures provide advantages over relational or vector-only approaches
Demonstrated expertise building production RAG systems including embedding selection, chunking strategies, hybrid search, reranking, and retrieval evaluation
Familiarity with vector databases (pgvector, Pinecone, Weaviate, Qdrant) and their performance characteristics
Strong understanding of embedding models (sentence transformers, domain-specific embeddings), fine-tuning approaches, and semantic similarity
Experience with document processing, entity extraction, and text chunking for optimal retrieval
Strong coding skills in Python and/or TypeScript/Node.js
Experience with our stack (TypeScript, Node.js, PostgreSQL, NextJS, Supabase) plus graph databases and vector stores
Familiarity with LLM APIs and context management patterns
Deep understanding of information retrieval theory, semantic search, knowledge representation, and strategies for organizing complex domain knowledge for both human and AI consumption
Track record of architecting complex knowledge systems, making high-leverage technical decisions about information architecture, and mentoring engineers on sophisticated retrieval and graph concepts

Preferred

Understanding of or interest in how educational content is structured (standards, curricula, learning progressions), curriculum relationships, and how knowledge organization differs across teaching scenarios
Experience with GraphRAG, knowledge graph embeddings (node2vec, TransE), or graph neural networks for link prediction and entity resolution
Familiarity with educational knowledge graphs, standards alignment systems (CASE framework), or EdTech content taxonomies
Background in semantic web technologies (RDF, OWL, SPARQL), ontology engineering, or knowledge graph construction from unstructured text
Experience with model context protocol (MCP) for tool-based retrieval, or building context-aware agent frameworks
Knowledge of curriculum standards, learning science, or educational metadata schemas (LOM, schema.org/LearningResource)
Experience with fine-tuning embedding models for domain-specific retrieval or building learned sparse retrievers

Benefits

Unlimited time off to empower our employees to manage their work-life balance.
Choice of employer-paid health insurance plans so that you can take care of yourself and your family.
Dental and vision are also offered at very low premiums.
Every employee is offered generous stock options, vested over 4 years.
401k match & monthly wellness stipend.

Company

MagicSchool AI

twittertwittertwitter
company-logo
MagicSchool is an AI platform in education and growing technology for schools.

Funding

Current Stage
Growth Stage
Total Funding
$62.4M
Key Investors
Valor Equity PartnersBain Capital VenturesRange Ventures
2025-02-04Series B· $45M
2024-06-27Series A· $15M
2023-08-28Pre Seed· $2.4M

Leadership Team

leader-logo
Adeel Khan
Founder & CEO
linkedin
leader-logo
Mike Biven
President
linkedin
Company data provided by crunchbase