bedrock · 10 hours ago
Cloud - ML Platform Engineer
Bedrock Robotics is transforming the physical world with autonomy and is seeking Senior or Staff level ML Platform Engineers to join their team. The role involves bridging infrastructure and machine learning to build systems for high scale training of end-to-end autonomy and collaborating with other teams to enhance AI/ML development and deployment processes.
ConstructionReal EstateSoftware
Responsibilities
Build out robot data labeling and evaluation pipelines
Design and create a data mining solution
Iterate and evolve experiment tracking
Improve performance of our scaled training loop
Work with other ML teams to understand their workflows and their needs
Develop, maintain, and enhance frameworks for AI/ML model development and deployment while establishing and driving best practices in MLOps
Design, advocate, and implement for usability, reliability, scalability, operational excellence, and cost management while delivering incrementally
Collaborate closely with ML Engineers, Data Scientists, Data Engineers and Product Managers to understand their needs and identify opportunities to accelerate the AI/ML development and deployment process
Qualification
Required
5+ years of professional software engineering experience building ML platforms, infrastructure, or internal tooling that accelerates model development and deployment for ML engineers
4+ years of experience (may include graduate research) as an ML, Data, Platform, or Distributed Systems Engineer working on large-scale, complex, or highly distributed systems
2+ years of hands-on experience designing, building, and operating production-grade ML systems, such as training pipelines, feature stores, model serving, evaluation frameworks, or workflow orchestration
Proven experience leading projects end-to-end, from concept and initial design through implementation and deployment, working with at least one other engineer
2+ years of close collaboration with cross-functional ML stakeholders to understand requirements and ship platform capabilities that improve iteration speed and reliability
Preferred
Experience in performance optimization for GPUs
Experience working with multi-modal data is a plus
Startup experience is a plus
Experience with robotics, simulation, or perception ML pipelines
Familiarity with modern ML infra stacks (Ray, Kubeflow, MLflow, Metaflow, Airflow, Feast, Vertex, SageMaker, etc.)
Hands on experience in distributed training, model optimization, or experiment tracking systems
Experience building internal developer platforms or self-service tooling