Software Engineer, Data Infrastructure jobs in United States
cer-icon
Apply on Employer Site
company-logo

Thinking Machines Lab · 1 month ago

Software Engineer, Data Infrastructure

Thinking Machines Lab is dedicated to advancing collaborative general intelligence and making AI accessible for everyone. The role involves contributing to data infrastructure, focusing on architecting and scaling core infrastructure for distributed training pipelines and intelligent processing systems for large datasets.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyProduct ResearchSoftware
check
H1B Sponsorednote

Responsibilities

Design, build, and operate scalable, fault-tolerant infrastructure for LLM Research: distributed compute, data orchestration, and storage across modalities
Develop high-throughput systems for data ingestion, processing, and transformation — including training data catalogs, deduplication, quality checks, and search
Build systems for traceability, reproducibility, and robust quality control at every stage of the data lifecycle
Implement and maintain monitoring and alerting to support platform reliability and performance
Collaborate with research teams to unlock new features, improve data quality, and accelerate training cycles

Qualification

PythonApache SparkCloud infrastructureDistributed compute frameworksKafkaData lake architecturesBatchStreaming pipelinesData miningRustTerraformAirflowWeb crawlerFile formatsTestingToolingDocumentation

Required

Bachelor's degree or equivalent experience in computer science, engineering, or similar
Proficiency in at least one backend language (we use Python or Rust)
Are fluent in distributed compute frameworks such as Apache Spark or Ray
Are deeply familiar with cloud infrastructure, data lake architectures, and batch and streaming pipelines
Comfort operating across the stack and owning projects end-to-end
Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts
A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships

Preferred

Have hands-on experience with Kafka, dbt, Terraform, and Airflow
Have experience building a web crawler
Have extensive experience understanding and scaling deduplication, data mining, and search
Have strong knowledge of file formats and storage systems (e.g., Parquet, Delta Lake, etc.) and how they impact performance and scalability
Are proactive about documentation, testing, and empowering your teammates with good tooling

Benefits

Generous health, dental, and vision benefits
Unlimited PTO
Paid parental leave
Relocation support as needed

Company

Thinking Machines Lab

twittertwittertwitter
company-logo
Thinking Machines Lab is an AI research and product company that aims to increase understanding and customization of AI systems.

H1B Sponsorship

Thinking Machines Lab has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9)

Funding

Current Stage
Early Stage
Total Funding
$2.01B
Key Investors
Andreessen HorowitzMinistry of Economy, Culture and Innovation
2025-06-20Seed· $2B
2025-05-05Grant· $9.98M

Leadership Team

leader-logo
Mira Murati
Co-Founder and Chief Executive Officer
linkedin

Recent News

Business Insider
Company data provided by crunchbase