Apply on Employer Site

Foundation Robotics · 1 month ago

Vision Language Action (VLA) models engineer

San Francisco, CA

Full-time

Onsite

Mid, Senior Level

$80K/yr - $1,000K/yr

Foundation Robotics is developing the future of general purpose robotics to address labor shortages and enhance efficiency in labor-intensive industries. They are seeking a Vision Language Action (VLA) models engineer to develop and optimize vision-language-action models, integrate reasoning with action planning, and collaborate with various teams to ensure model outputs are executable and safe.

Higher Education

Responsibilities

Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders

Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning

Integrate LLM-based reasoning with action planning and control policies

Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data

Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion)

Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans

Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability

Build scalable pipelines for ingesting, labeling, and generating multimodal training data

Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data

Optimize training pipelines, model parallelism, and evaluation frameworks

Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable

Collaborate with product teams to define robot capabilities and user-facing behaviors

Participate in user and field testing to iterate on real-world performance

Qualification

Multimodal modelsPyTorchTraining pipelinesVision transformersLLMsRobotics simulationPythonGPU accelerationSynthetic dataset creationReinforcement learningSoftware engineering