Apply on Employer Site

Labelbox · 2 months ago

Senior Product Engineer, AI Data Platform

San Francisco Bay Area

Full-time

Onsite

Mid, Senior Level

$160K/yr - $260K/yr

4+ years exp

Labelbox is a company building critical infrastructure for AI development, focusing on data-centric approaches. The Senior Product Engineer for the AI Data Platform will lead the design and development of data infrastructure, ensuring efficient data management and streaming for training AI models, while collaborating with cross-functional teams to enhance platform adoption.

Artificial Intelligence (AI)Computer VisionData Collection and LabelingEnterprise SoftwareMachine LearningSoftware

H1B Sponsor Likely

Responsibilities

Design and build scalable data infrastructure, integrating high-performance databases (relational, NoSQL, cloud-native) with distributed systems for data processing, storage, and streaming

Optimize database systems for performance, reliability, and scalability, ensuring efficient data retrieval, indexing, and querying to support AI workflows

Develop and maintain data pipelines using distributed queues, message brokers, and job management mechanisms to enable high-throughput import/export operations

Collaborate with team members and stakeholders to align data infrastructure with platform goals and customer needs

Participate in Sprint Planning, Standups, and related activities to drive data-focused initiatives forward

Mentor and guide less experienced engineers, sharing expertise in data infrastructure and database optimization

Support the team’s area of ownership by working with the Support organization to resolve customer-facing data issues

Stay abreast of industry trends in data infrastructure and database technologies, incorporating relevant innovations into our systems

Contribute to technical documentation, research publications, blog posts, and presentations at conferences and forums

Innovation in AI: Enhance data infrastructure capabilities for an AI platform used by leading AI labs to develop powerful multi-modal large language models (LLMs)

Qualification

Data infrastructure designDatabase managementDistributed systemsData pipelinesCloud-native solutionsPythonJavaTypeScriptNoSQL databasesCommunicationProblem-solvingTeam collaborationAttention to detail

Required

Bachelor's degree in Computer Science, Data Engineering, or a related field

4+ years of work experience in a software or data-focused company, with significant expertise in data infrastructure and backend engineering

Deep knowledge of designing and managing scalable database systems, including relational databases (e.g., PostgreSQL, MySQL), NoSQL stores (e.g., MongoDB, Cassandra), and cloud-native solutions (e.g., Google Spanner, AWS DynamoDB)

Strong experience with data infrastructure components such as data pipelines, streaming systems, and storage architectures (e.g., Cloud Buckets, Key-Value Stores)

Proficiency in optimizing databases for performance (e.g., schema design, indexing, query tuning) and integrating them with broader data workflows

Previous experience with distributed systems tools (e.g., queues, message brokers like Kafka or RabbitMQ, job orchestration frameworks) for real-time data processing and other use cases

Previous experience with search engines (e.g., ElasticSearch)

Knowledge of backend development using languages like Python, Java, or TypeScript; familiarity with NodeJS and NestJS is a plus

Proficient in data structures, algorithms, and system design for large-scale data management

Demonstrated ability to keep up with trends in data infrastructure and database technologies

Excellent communication and collaboration skills

Strong sense of ownership and ability to thrive in a fast-paced environment

Comfortable with ambiguity, breaking down high-level requirements into actionable data infrastructure tasks methodically

Resourceful problem-solver with attention to detail, eager to take initiative and deliver results

High proficiency in leveraging AI tools for daily development (e.g., Cursor, GitHub Copilot)

Preferred

Advanced degree preferred

Familiarity with data warehousing solutions (e.g., Snowflake, BigQuery)

Experience with container orchestration systems (e.g., Kubernetes) for deploying data infrastructure components

Experience with one or more public cloud platforms: Google Cloud Platform (GCP) (preferred), Amazon Web Services (AWS), Microsoft Azure

Understanding of the Data + AI ecosystem and its relevance to large-scale AI platforms

Knowledge of memory management and optimization in data-intensive systems

Experience with DevOps tools (e.g., ArgoCD, DataDog) for monitoring and managing data infrastructure

Previous experience using LLM backed AI services such as from OpenAI, Anthropic, Google, etc. to develop product features

Company

Labelbox

Labelbox is the leading data factory for AI teams.

Founded in 2018

San Francisco, California, USA

51-200 employees

https://labelbox.com

H1B Sponsorship

Labelbox has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (12)

2024 (6)

2023 (6)

2022 (7)

2021 (3)

2020 (3)

Funding

Current Stage

Late Stage

Total Funding

$188.9M

Key Investors

SoftBank Vision FundAndreessen HorowitzGradient

2022-01-06Series D· $110M

2021-02-11Series C· $40M

2020-02-04Series B· $25M

Leadership Team

Manu Sharma

CEO & Co-founder

Recent News

IndiaTimes

Building the AI-First Future: Inside Tau Ventures with Amit Garg

2025-11-02

GlobeNewswire News Room

Data Annotation Tools Market Report 2025, with Profiles of 30+ Companies including Amazon Mechanical Turk, Clickworker, CloudFactory, Cogito Tech, Figure Eight, Labelbox, LightTag, Playment, & Tagtog

2025-07-04

Bloomberg

Scale AI Rivals See Customer Demand Surge After Meta Investment

2025-06-20

Company data provided by crunchbase