Apply on Employer Site

MagicSchool AI · 2 months ago

Staff AI Engineer, Data Systems

United States

Full-time

Remote

Senior Level, Lead/Staff

$205K/yr - $240K/yr

5+ years exp

MagicSchool AI is the premier generative AI platform for teachers, aiming to make a real social impact in education. The Staff AI Engineer will architect information infrastructure for AI agents, focusing on knowledge organization, retrieval, and memory systems to enhance educational content access and reasoning for millions of educators.

Artificial Intelligence (AI)E-LearningEdTechEducation

Responsibilities

Architect and implement graph-based knowledge systems (Neo4j, Neptune, etc) that represent educational content relationships, standards alignments, prerequisite chains, curriculum coherence, learning progressions, and pedagogical connections

Design and evolve ontologies and schemas for educational content, defining entity types (standards, concepts, skills, assessments), relationship semantics, and property models that support both human comprehension and AI reasoning

Build GraphRAG systems that combine knowledge graph traversal with vector similarity, enabling agents to retrieve not just similar content but contextually connected educational materials through semantic and structural relationships

Architect and implement sophisticated retrieval-augmented generation pipelines including hybrid search (dense + sparse), multi-stage retrieval, reranking strategies, and query understanding that surface the most relevant educational content for agent reasoning

Design and operationalize embedding pipelines for educational content, selecting and fine-tuning embedding models, implementing chunking strategies appropriate for curriculum materials, and managing vector stores at scale for fast, accurate retrieval

Design evaluation pipelines that measure retrieval precision, recall, MRR, and NDCG across educational content types. Continuously optimize retrieval quality through experimentation with embedding models, chunking strategies, and ranking algorithms

Build robust ingestion systems that process structured (standards documents, curriculum frameworks, JSON) and unstructured (PDFs, lesson plans, textbooks) educational content, extracting entities, relationships, and metadata for knowledge base population

Implement NLP pipelines for educational content that extract key concepts, prerequisite relationships, learning objectives, and pedagogical metadata, enriching raw content with structured annotations for improved retrieval and reasoning

Invent and operationalize memory compaction mechanisms, session state management, and cross-conversation memory patterns that allow agents to maintain coherence across extended teaching workflows while respecting token budgets

Design evaluation frameworks that measure retrieval precision, token relevance, attention allocation, and reasoning coherence as context evolves across sessions. Work with the evaluations team on detecting context degradation and retrieval failures

Partner with Product, Research, and Educators to understand content relationships, retrieval requirements, and context needs across different teaching scenarios, translating domain expertise into technical architecture

Collaborate with ML researchers / evaluations team and context engineers to co-design architectures that integrate knowledge graphs, vector stores, and retrieval systems with agent runtimes and LLM inference pipelines

Guide engineers on knowledge graph design, RAG architecture patterns, embedding strategies, and retrieval optimization, elevating the team's capability in building knowledge-intensive AI systems

Qualification

Knowledge Graph DesignRAG Systems ExpertiseGraph Database ExpertiseEmbedding & NLP BackgroundInformation ArchitectureTechnical StackEducational Context AwarenessTechnical MentorshipCross-Functional Collaboration

Required

5+ years building large-scale information systems with at least 2+ years in staff/senior roles

Extensive hands-on experience with RAG systems, knowledge graphs, or semantic search platforms in production environments

Deep experience with graph databases (Neo4j, Neptune, or similar), including schema design, query optimization (Cypher, Gremlin), and building graph-based applications

Understanding of when graph structures provide advantages over relational or vector-only approaches

Demonstrated expertise building production RAG systems including embedding selection, chunking strategies, hybrid search, reranking, and retrieval evaluation

Familiarity with vector databases (pgvector, Pinecone, Weaviate, Qdrant) and their performance characteristics

Strong understanding of embedding models (sentence transformers, domain-specific embeddings), fine-tuning approaches, and semantic similarity

Experience with document processing, entity extraction, and text chunking for optimal retrieval

Strong coding skills in Python and/or TypeScript/Node.js

Experience with our stack (TypeScript, Node.js, PostgreSQL, NextJS, Supabase) plus graph databases and vector stores

Familiarity with LLM APIs and context management patterns

Deep understanding of information retrieval theory, semantic search, knowledge representation, and strategies for organizing complex domain knowledge for both human and AI consumption

Track record of architecting complex knowledge systems, making high-leverage technical decisions about information architecture, and mentoring engineers on sophisticated retrieval and graph concepts

Preferred

Understanding of or interest in how educational content is structured (standards, curricula, learning progressions), curriculum relationships, and how knowledge organization differs across teaching scenarios

Experience with GraphRAG, knowledge graph embeddings (node2vec, TransE), or graph neural networks for link prediction and entity resolution

Familiarity with educational knowledge graphs, standards alignment systems (CASE framework), or EdTech content taxonomies

Background in semantic web technologies (RDF, OWL, SPARQL), ontology engineering, or knowledge graph construction from unstructured text

Experience with model context protocol (MCP) for tool-based retrieval, or building context-aware agent frameworks

Knowledge of curriculum standards, learning science, or educational metadata schemas (LOM, schema.org/LearningResource)

Experience with fine-tuning embedding models for domain-specific retrieval or building learned sparse retrievers