Vision Language Action (VLA) models engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Foundation Robotics · 1 month ago

Vision Language Action (VLA) models engineer

Foundation Robotics is developing the future of general purpose robotics to address labor shortages and enhance efficiency in labor-intensive industries. They are seeking a Vision Language Action (VLA) models engineer to develop and optimize vision-language-action models, integrate reasoning with action planning, and collaborate with various teams to ensure model outputs are executable and safe.

Higher Education

Responsibilities

Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders
Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning
Integrate LLM-based reasoning with action planning and control policies
Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data
Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion)
Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans
Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability
Build scalable pipelines for ingesting, labeling, and generating multimodal training data
Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data
Optimize training pipelines, model parallelism, and evaluation frameworks
Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable
Collaborate with product teams to define robot capabilities and user-facing behaviors
Participate in user and field testing to iterate on real-world performance

Qualification

Multimodal modelsPyTorchTraining pipelinesVision transformersLLMsRobotics simulationPythonGPU accelerationSynthetic dataset creationReinforcement learningSoftware engineering

Required

Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs
Ability to build and iterate on large-scale training pipelines
Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration
Strong software engineering skills in Python and modern ML tooling
Experience with (synthetic) dataset creation and curation
Understanding of real-time deployment constraints on embedded hardware
MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience

Preferred

familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar)
hands-on experience with robotics, embodied AI, or reinforcement/imitation learning

Benefits

Health
Vision
Dental
401k

Company

Foundation Robotics

twitter
company-logo
Learn Robotics is an upcoming webapp for lecturers and instructors of robotics courses at the bachelors and masters level.

Funding

Current Stage
Early Stage
Company data provided by crunchbase