Senior Product Engineer, AI Data Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

Labelbox · 2 months ago

Senior Product Engineer, AI Data Platform

Labelbox is a company building critical infrastructure for AI development, focusing on data-centric approaches. The Senior Product Engineer for the AI Data Platform will lead the design and development of data infrastructure, ensuring efficient data management and streaming for training AI models, while collaborating with cross-functional teams to enhance platform adoption.

Artificial Intelligence (AI)Computer VisionData Collection and LabelingEnterprise SoftwareMachine LearningSoftware
check
H1B Sponsor Likelynote

Responsibilities

Design and build scalable data infrastructure, integrating high-performance databases (relational, NoSQL, cloud-native) with distributed systems for data processing, storage, and streaming
Optimize database systems for performance, reliability, and scalability, ensuring efficient data retrieval, indexing, and querying to support AI workflows
Develop and maintain data pipelines using distributed queues, message brokers, and job management mechanisms to enable high-throughput import/export operations
Collaborate with team members and stakeholders to align data infrastructure with platform goals and customer needs
Participate in Sprint Planning, Standups, and related activities to drive data-focused initiatives forward
Mentor and guide less experienced engineers, sharing expertise in data infrastructure and database optimization
Support the team’s area of ownership by working with the Support organization to resolve customer-facing data issues
Stay abreast of industry trends in data infrastructure and database technologies, incorporating relevant innovations into our systems
Contribute to technical documentation, research publications, blog posts, and presentations at conferences and forums
Innovation in AI: Enhance data infrastructure capabilities for an AI platform used by leading AI labs to develop powerful multi-modal large language models (LLMs)

Qualification

Data infrastructure designDatabase managementDistributed systemsData pipelinesCloud-native solutionsPythonJavaTypeScriptNoSQL databasesCommunicationProblem-solvingTeam collaborationAttention to detail

Required

Bachelor's degree in Computer Science, Data Engineering, or a related field
4+ years of work experience in a software or data-focused company, with significant expertise in data infrastructure and backend engineering
Deep knowledge of designing and managing scalable database systems, including relational databases (e.g., PostgreSQL, MySQL), NoSQL stores (e.g., MongoDB, Cassandra), and cloud-native solutions (e.g., Google Spanner, AWS DynamoDB)
Strong experience with data infrastructure components such as data pipelines, streaming systems, and storage architectures (e.g., Cloud Buckets, Key-Value Stores)
Proficiency in optimizing databases for performance (e.g., schema design, indexing, query tuning) and integrating them with broader data workflows
Previous experience with distributed systems tools (e.g., queues, message brokers like Kafka or RabbitMQ, job orchestration frameworks) for real-time data processing and other use cases
Previous experience with search engines (e.g., ElasticSearch)
Knowledge of backend development using languages like Python, Java, or TypeScript; familiarity with NodeJS and NestJS is a plus
Proficient in data structures, algorithms, and system design for large-scale data management
Demonstrated ability to keep up with trends in data infrastructure and database technologies
Excellent communication and collaboration skills
Strong sense of ownership and ability to thrive in a fast-paced environment
Comfortable with ambiguity, breaking down high-level requirements into actionable data infrastructure tasks methodically
Resourceful problem-solver with attention to detail, eager to take initiative and deliver results
High proficiency in leveraging AI tools for daily development (e.g., Cursor, GitHub Copilot)

Preferred

Advanced degree preferred
Familiarity with data warehousing solutions (e.g., Snowflake, BigQuery)
Experience with container orchestration systems (e.g., Kubernetes) for deploying data infrastructure components
Experience with one or more public cloud platforms: Google Cloud Platform (GCP) (preferred), Amazon Web Services (AWS), Microsoft Azure
Understanding of the Data + AI ecosystem and its relevance to large-scale AI platforms
Knowledge of memory management and optimization in data-intensive systems
Experience with DevOps tools (e.g., ArgoCD, DataDog) for monitoring and managing data infrastructure
Previous experience using LLM backed AI services such as from OpenAI, Anthropic, Google, etc. to develop product features

Company

Labelbox

twittertwittertwitter
company-logo
Labelbox is the leading data factory for AI teams.

H1B Sponsorship

Labelbox has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (6)
2023 (6)
2022 (7)
2021 (3)
2020 (3)

Funding

Current Stage
Late Stage
Total Funding
$188.9M
Key Investors
SoftBank Vision FundAndreessen HorowitzGradient
2022-01-06Series D· $110M
2021-02-11Series C· $40M
2020-02-04Series B· $25M

Leadership Team

leader-logo
Manu Sharma
CEO & Co-founder
linkedin
Company data provided by crunchbase