Vision Language Action (VLA) Models Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Foundation · 1 day ago

Vision Language Action (VLA) Models Engineer

Foundation is developing the future of general purpose robotics with the goal to address the labor shortage. They are seeking a Vision Language Action (VLA) Models Engineer to develop and optimize vision-language-action models and integrate them with real-time robot control stacks.

Artificial Intelligence (AI)Machine LearningRobotics
check
H1B Sponsor Likelynote
Hiring Manager
Jordi Vidal
linkedin

Responsibilities

Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders
Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning
Integrate LLM-based reasoning with action planning and control policies
Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data
Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion)
Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans
Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability
Build scalable pipelines for ingesting, labeling, and generating multimodal training data
Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data
Optimize training pipelines, model parallelism, and evaluation frameworks
Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable
Collaborate with product teams to define robot capabilities and user-facing behaviors
Participate in user and field testing to iterate on real-world performance

Qualification

Multimodal modelsPyTorchTraining pipelinesVision transformersGPU accelerationDataset creationRobotics simulationPythonEmbedded hardwareReinforcement learningMScPhD

Required

Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs
Ability to build and iterate on large-scale training pipelines
Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration
Strong software engineering skills in Python and modern ML tooling
Experience with (synthetic) dataset creation and curation
Understanding of real-time deployment constraints on embedded hardware
MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience

Preferred

Familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar)
Hands-on experience with robotics, embodied AI, or reinforcement/imitation learning

Benefits

Health
Vision
Dental
401k

Company

Foundation

twittertwittertwitter
company-logo
Foundation is developing the future of general purpose robotics with the goal to address the labor shortage.

H1B Sponsorship

Foundation has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Growth Stage
Total Funding
unknown
2024-08-22Pre Seed
Company data provided by crunchbase