Exa · 1 month ago
Software Engineer, Distributed Data Systems
Exa is building a search engine from scratch to serve every AI application, focusing on massive-scale infrastructure. As a Data Engineer, you'll have the autonomy to architect and build data systems that support a wide range of AI applications.
InternetSearch EngineSoftware
Responsibilities
Architect and build the data infrastructure that powers everything we do—from crawling billions of pages to training our embedding models to serving real-time search
Design systems that scale to hundreds of petabytes
Build data pipelines at a scale that most companies only dream about
Design a lakehouse architecture that handles 100+ PB of web crawl data
Build streaming pipelines that process billions of documents per day for real-time indexing
Architect the data layer for our embedding training infrastructure on Ray
Scale our ClickHouse deployment to handle analytical queries across petabytes of search logs
Qualification
Required
Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use them
Experience building and operating large-scale distributed data processing pipelines
Hands-on experience with streaming data systems (Kafka, Flink, or similar)
Familiarity with Ray, Spark, or ClickHouse at production scale
An obsessive focus on reliability and building systems that don't page you at 3am
Preferred
Experience with Lance or other vector-native storage formats
Background in GPU-accelerated data processing (RAPIDS, cuDF)
Company
Exa
Exa was built with a simple goal — to organize all knowledge.
H1B Sponsorship
Exa has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (1)
Funding
Current Stage
Early StageTotal Funding
$2.12M2023-04-01Seed
2021-10-15Seed· $2M
2021-08-27Pre Seed· $0.12M
Company data provided by crunchbase