Software Engineer, Distributed Data Systems jobs in United States
cer-icon
Apply on Employer Site
company-logo

Exa · 1 month ago

Software Engineer, Distributed Data Systems

Exa is building a search engine from scratch to serve every AI application, focusing on massive-scale infrastructure. As a Data Engineer, you'll have the autonomy to architect and build data systems that support a wide range of AI applications.

InternetSearch EngineSoftware
check
H1B Sponsorednote

Responsibilities

Architect and build the data infrastructure that powers everything we do—from crawling billions of pages to training our embedding models to serving real-time search
Design systems that scale to hundreds of petabytes
Build data pipelines at a scale that most companies only dream about
Design a lakehouse architecture that handles 100+ PB of web crawl data
Build streaming pipelines that process billions of documents per day for real-time indexing
Architect the data layer for our embedding training infrastructure on Ray
Scale our ClickHouse deployment to handle analytical queries across petabytes of search logs

Qualification

Lakehouse architecturesDistributed data processingStreaming data systemsRaySparkClickHouseGPU-accelerated processingReliability focus

Required

Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use them
Experience building and operating large-scale distributed data processing pipelines
Hands-on experience with streaming data systems (Kafka, Flink, or similar)
Familiarity with Ray, Spark, or ClickHouse at production scale
An obsessive focus on reliability and building systems that don't page you at 3am

Preferred

Experience with Lance or other vector-native storage formats
Background in GPU-accelerated data processing (RAPIDS, cuDF)

Company

Exa

twittertwittertwitter
company-logo
Exa was built with a simple goal — to organize all knowledge.

H1B Sponsorship

Exa has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (1)

Funding

Current Stage
Early Stage
Total Funding
$2.12M
2023-04-01Seed
2021-10-15Seed· $2M
2021-08-27Pre Seed· $0.12M

Leadership Team

leader-logo
Will Bryk
Co-Founder and CEO
linkedin
Company data provided by crunchbase