Foundation Robotics · 1 month ago
Vision Language Action (VLA) models engineer
Foundation Robotics is developing the future of general purpose robotics to address labor shortages and enhance efficiency in labor-intensive industries. They are seeking a Vision Language Action (VLA) models engineer to develop and optimize vision-language-action models, integrate reasoning with action planning, and collaborate with various teams to ensure model outputs are executable and safe.
Higher Education
Responsibilities
Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders
Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning
Integrate LLM-based reasoning with action planning and control policies
Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data
Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion)
Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans
Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability
Build scalable pipelines for ingesting, labeling, and generating multimodal training data
Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data
Optimize training pipelines, model parallelism, and evaluation frameworks
Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable
Collaborate with product teams to define robot capabilities and user-facing behaviors
Participate in user and field testing to iterate on real-world performance
Qualification
Required
Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs
Ability to build and iterate on large-scale training pipelines
Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration
Strong software engineering skills in Python and modern ML tooling
Experience with (synthetic) dataset creation and curation
Understanding of real-time deployment constraints on embedded hardware
MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience
Preferred
familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar)
hands-on experience with robotics, embodied AI, or reinforcement/imitation learning
Benefits
Health
Vision
Dental
401k
Company
Foundation Robotics
Learn Robotics is an upcoming webapp for lecturers and instructors of robotics courses at the bachelors and masters level.
Funding
Current Stage
Early StageCompany data provided by crunchbase