Foundation · 1 day ago
Vision Language Action (VLA) Models Engineer
Foundation is developing the future of general purpose robotics with the goal to address the labor shortage. They are seeking a Vision Language Action (VLA) Models Engineer to develop and optimize vision-language-action models and integrate them with real-time robot control stacks.
Responsibilities
Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders
Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning
Integrate LLM-based reasoning with action planning and control policies
Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data
Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion)
Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans
Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability
Build scalable pipelines for ingesting, labeling, and generating multimodal training data
Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data
Optimize training pipelines, model parallelism, and evaluation frameworks
Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable
Collaborate with product teams to define robot capabilities and user-facing behaviors
Participate in user and field testing to iterate on real-world performance
Qualification
Required
Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs
Ability to build and iterate on large-scale training pipelines
Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration
Strong software engineering skills in Python and modern ML tooling
Experience with (synthetic) dataset creation and curation
Understanding of real-time deployment constraints on embedded hardware
MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience
Preferred
Familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar)
Hands-on experience with robotics, embodied AI, or reinforcement/imitation learning
Benefits
Health
Vision
Dental
401k
Company
Foundation
Foundation is developing the future of general purpose robotics with the goal to address the labor shortage.
H1B Sponsorship
Foundation has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Growth StageTotal Funding
unknown2024-08-22Pre Seed
Company data provided by crunchbase