Software Engineer, Data Infrastructure - Research jobs in United States
cer-icon
Apply on Employer Site
company-logo

OpenAI · 2 weeks ago

Software Engineer, Data Infrastructure - Research

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role involves designing and implementing dataset infrastructure to support OpenAI's training stack, ensuring efficient and scalable dataset management for researchers.

Agentic AIArtificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language ProcessingSaaS
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory
Build proactive testing and scale validation pipelines for dataset loading at GPU scale
Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines, ensuring smooth adoption and a great user experience
Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt
Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized
Debug and resolve performance bottlenecks in distributed dataset loading (e.g., straggler systems slowing global training)
Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets

Qualification

Distributed systemsData pipelinesAPI developmentGPU-scale systemsPerformance debuggingUser experience focusCollaborationDocumentation

Required

Have strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure
Have experience building APIs, modular code, and scalable abstractions, while recognizing that abstractions ultimately serve the users and UX is an important part of the abstractions design
Are comfortable debugging bottlenecks across large fleets of machines
Take pride in building infrastructure that 'just works,' and find joy in being the guardian of reliability and scale
Are collaborative, humble, and excited to own a foundational (if not glamorous) part of the ML stack

Preferred

Have background knowledge in data math, probability, or distributed data theory
Have worked with GPU-scale distributed systems or dataset scaling for real-time data

Company

OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.

H1B Sponsorship

OpenAI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)

Funding

Current Stage
Growth Stage
Total Funding
$79B
Key Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B

Leadership Team

leader-logo
Sam Altman
CEO & Co-Founder
leader-logo
Greg Brockman
President, Chairman, & Co-Founder
linkedin
Company data provided by crunchbase