Apply on Employer Site

Fabrion · 2 months ago

Data Engineer (Founding Team)

San Francisco, CA

Full-time

Onsite

Senior Level

5+ years exp

Fabrion is a company backed by 8VC, focused on building a multi-tenant, AI-native platform to tackle critical infrastructure problems in the industry. The Data Engineer will be responsible for building reliable data ingestion and transformation pipelines, designing the data fabric layer, and collaborating with ML teams to enhance model training with high-quality enterprise data.

Artificial Intelligence (AI)Machine Learning

Responsibilities

Build highly reliable, scalable data ingestion and transformation pipelines across structured, semi-structured, and unstructured data sources

Develop and maintain a connector framework for ingesting from enterprise systems (ERPs, PLMs, CRMs, legacy data stores, email, Excel, docs, etc.)

Design and maintain the data fabric layer — including a knowledge graph (Neo4j or Puppygraph) enriched with ontologies, metadata, and relationships

Normalize and vectorize data for downstream AI/LLM workflows — enabling retrieval-augmented generation (RAG), summarization, and alerting

Create and manage data contracts, access layers, lineage, and governance mechanisms

Build and expose secure APIs for downstream services, agents, and users to query enriched semantic data

Collaborate with ML/LLM teams to feed high-quality enterprise data into model training and tuning pipelines

Qualification

Data pipeline orchestrationKnowledge graphsData governanceIngestion frameworksUnstructured data processingGraphQLSystem thinkerNavigating ambiguous data modelsPassionate about AI systemsValue autonomy

Required

5+ years building large-scale data infrastructure in production environments

Deep experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran) and data pipeline orchestration (Airflow, Dagster, Prefect)

Comfortable processing unstructured data formats: PDFs, Excel, emails, logs, CSVs, web APIs

Experience working with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)

Strong background in knowledge graphs or semantic modeling (e.g. Neo4j, RDF, Gremlin, Puppygraph)

Familiarity with GraphQL, RESTful APIs, and designing developer-friendly data access layers

Experience implementing data governance: RBAC, ABAC, data contracts, lineage, data quality checks

You're a system thinker: you want to model the real world, not just process it

Comfortable navigating ambiguous data models and building from scratch

Passionate about enabling AI systems with real-world, messy enterprise data

Pragmatic about scalability, observability, and schema evolution

Value autonomy, high trust, and meaningful ownership over infrastructure

Preferred

Prior work with vector DBs (e.g. Weaviate, Qdrant, Pinecone) and embedding pipelines

Experience building or contributing to enterprise connector ecosystems

Knowledge of ontology versioning, graph diffing, or semantic schema alignment

Familiarity with data fabric patterns (e.g. Palantir Ontology, Linked Data, W3C standards)

Familiar with fine-tuning LLMs or enabling RAG pipelines using enterprise knowledge

Experience enforcing data access policy with tools like OPA, Keycloak, Snowflake row-level security

Benefits

Competitive salary

Early-stage equity

Company

Fabrion

Fabrion is an AI-native platform purpose-built for the new industrial era

Founded in 2025

San Francisco, California, USA

2-10 employees

https://www.fabrion.com/

Funding

Current Stage

Early Stage

Total Funding

unknown

2026-01-01Seed

Company data provided by crunchbase