Toyota Research Institute · 2 days ago
Senior Data Engineer
Toyota Research Institute (TRI) is on a mission to improve the quality of human life through innovative tools and capabilities. They are seeking a Senior Data Engineer to design and build foundational data infrastructure and tools for autonomy research and development workflows, including large-scale ingestion pipelines and performance diagnostics for machine learning workflows.
Artificial Intelligence (AI)AutomotiveConsumer ResearchMachine LearningProduct Research
Responsibilities
Design and implement scalable, production-grade pipelines for data ingestion, transformation, storage, and retrieval from vehicle fleets and simulation environments
Build internal tools and services for data labeling, curation, indexing, and cataloging across large and diverse datasets
Collaborate with ML researchers, autonomy engineers, and data scientists to design schemas and APIs that power model training, evaluation, and debugging
Develop and maintain feature stores, metadata systems, and versioning infrastructure for structured and unstructured data
Support the generation and integration of synthetic datasets with real-world logs to enable hybrid training and simulation workflows
Optimize pipelines for cost, latency, and traceability, ensuring reproducibility and consistency across environments
Partner with simulation and cloud platform teams to automate workflows for closed-loop testing, scenario mining, and performance analytics
Qualification
Required
Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field
8+ years of experience building data-intensive software systems, ideally in robotics, autonomous driving, or large-scale ML environments
Proficient in Python, SQL, and familiar with C++
Experience designing ETL pipelines using modern frameworks (e.g., Apache Spark, Flyte, Union)
Strong knowledge of cloud-native architectures, including AWS services (e.g., S3, or equivalents (Google Cloud platform)
Familiarity with sensor data types (camera, lidar, radar, GPS/IMU) and common data serialization formats (e.g., protobuf. ROS2bag, MCAP)
Deep understanding of data quality, observability, and lineage in high-volume systems
Track record of building reliable and performant infrastructure that supports both ad-hoc exploration and repeatable production workflows
Preferred
Experience in AD/ADAS, robotics, or autonomous systems — especially handling perception or planning datasets
Familiarity with ML pipeline orchestration frameworks (e.g. Kubeflow, SageMaker, etc)
Experience working with temporal or spatial data, including geospatial indexing and time-series alignment
Exposure to synthetic data generation, simulation logging, or scenario replay pipelines
Strong software engineering fundamentals, CI/CD, testing, code review, and service deployment best practices
Experience collaborating with cross-functional, distributed teams across research and production orgs
Benefits
Medical, dental, and vision insurance
Paid time off benefits (including holiday pay and sick time)
Company
Toyota Research Institute
Toyota Research Institute is an R&D enterprise with an initial focus on artificial intelligence and robotics.
H1B Sponsorship
Toyota Research Institute has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2020 (7)
Funding
Current Stage
Growth StageRecent News
2025-12-18
Company data provided by crunchbase