OpenAI · 2 weeks ago
Software Engineer, Data Infrastructure - Research
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role involves designing and implementing dataset infrastructure to support OpenAI's training stack, ensuring efficient and scalable dataset management for researchers.
Agentic AIArtificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language ProcessingSaaS
Responsibilities
Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory
Build proactive testing and scale validation pipelines for dataset loading at GPU scale
Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines, ensuring smooth adoption and a great user experience
Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt
Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized
Debug and resolve performance bottlenecks in distributed dataset loading (e.g., straggler systems slowing global training)
Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets
Qualification
Required
Have strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure
Have experience building APIs, modular code, and scalable abstractions, while recognizing that abstractions ultimately serve the users and UX is an important part of the abstractions design
Are comfortable debugging bottlenecks across large fleets of machines
Take pride in building infrastructure that 'just works,' and find joy in being the guardian of reliability and scale
Are collaborative, humble, and excited to own a foundational (if not glamorous) part of the ML stack
Preferred
Have background knowledge in data math, probability, or distributed data theory
Have worked with GPU-scale distributed systems or dataset scaling for real-time data
Company
OpenAI
OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.
H1B Sponsorship
OpenAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)
Funding
Current Stage
Growth StageTotal Funding
$79BKey Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B
Recent News
2026-01-09
The Motley Fool
2026-01-09
Company data provided by crunchbase