news 2026/2/7 12:38:18

在看完近50篇 VLA+RL 工作之后......

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
在看完近50篇 VLA+RL 工作之后......

主页:http://qingkeai.online/

原文:https://mp.weixin.qq.com/s/lfkwxQ-7N2jdVaOFAN5GmQ

随着基于大规模模仿学习的视觉-语言-动作 (VLA) 模型取得显著进展,将VLA强化学习 (RL)相结合已成为一种极具前景的新范式。该范式利用与环境的试错交互或预先采集的次优数据,进一步提升机器人的决策与执行能力。

本文对该领域的关键论文进行了分类整理,涵盖离线RL、在线RL、世界模型、推理时RL及对齐技术。

一、 离线强化学习 (Offline RL)

离线 RL 预训练的 VLA 模型利用人类演示和自主收集的数据进行学习,无需实时环境交互。

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
链接:https://arxiv.org/abs/2309.10150 代码:https://github.com/google-deepmind/q_transformer

Offline Actor-Critic Reinforcement Learning Scales to Large Models (Perceiver-Actor-Critic)
链接:https://arxiv.org/abs/2402.05546 代码:https://offline-actor-critic.github.io/

GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
链接:https://arxiv.org/abs/2403.13358 代码:https://github.com/Improbable-AI/germ

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
链接:https://arxiv.org/abs/2505.07395

MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models
链接:https://arxiv.org/abs/2503.08007

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
链接:https://arxiv.org/pdf/2508.02219

Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models (ARFM)
链接:https://arxiv.org/pdf/2509.04063


二、 在线强化学习 (Online RL)

通过在环境中的试错交互,进一步优化 VLA 模型的性能。

1. 仿真环境内 (In Simulator)

FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
链接:https://arxiv.org/abs/2409.16578 代码:https://github.com/flare-vla/flare

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone (PA-RL)
链接:https://arxiv.org/abs/2412.06685 代码:https://pa-rl.github.io/

Improving Vision-Language-Action Model with Online Reinforcement Learning (iRe-VLA)
链接:https://arxiv.org/abs/2501.16664

Interactive Post-Training for Vision-Language-Action Models (RIPT-VLA)
链接:https://arxiv.org/abs/2505.17016 代码:https://github.com/OpenHelix-Team/RIPT

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
链接:https://arxiv.org/abs/2505.18719 代码:https://github.com/vla-rl/vla-rl

What Can RL Bring to VLA Generalization? An Empirical Study (RLVLA)
链接:https://arxiv.org/abs/2505.19789 代码:https://github.com/S-S-X/RLVLA

RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback
链接:https://arxiv.org/abs/2505.19767

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
周六上午10点!一起聊聊VLA强化学习训练框架:SimpleVLA-RL
链接:https://arxiv.org/pdf/2509.09674 代码:https://github.com/SimpleVLA/SimpleVLA

TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
链接:https://arxiv.org/abs/2506.08440 代码:https://github.com/TGRPO/TGRPO

OctoNav: Towards Generalist Embodied Navigation
链接:https://arxiv.org/abs/2506.09839 代码:https://octonav.github.io/

RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
链接:https://arxiv.org/pdf/2506.17639 代码:https://rlrc-vla.github.io/

RLinf: Reinforcement Learning Infrastructure for Agentic AI
下周二晚8点!和无问芯穹首席研究员林灏,一起聊聊具身智能 RL 训练框架 RLinf 的系统设计
链接:https://arxiv.org/pdf/2509.15965 代码:https://rlinf.github.io/

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
VLA+RL 算法如何设计?从零上手 OpenVLA 的强化学习微调实践
链接:https://arxiv.org/pdf/2510.06710v1

2. 真实世界 (In Real-World)

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
链接:https://arxiv.org/abs/2412.09858 代码:https://rldg.github.io/

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
链接:https://arxiv.org/abs/2412.06685 代码:https://github.com/MaxSobolMark/PolicyAgnosticRL

Improving Vision-Language-Action Model with Online Reinforcement Learning
链接:https://arxiv.org/abs/2501.16664

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
链接:https://arxiv.org/abs/2502.05450 代码:https://github.com/ConRFT/ConRFT

VLAC: A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
链接:https://arxiv.org/abs/2509.15937 代码:https://github.com/VLAC-VLA/VLAC

Self-Improving Embodied Foundation Models (Generalist)
链接:https://arxiv.org/pdf/2509.15155


三、 世界模型 (World Model / Model-Based RL)


利用世界模型作为虚拟环境,实现低成本、安全的 VLA 策略后训练。

World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
链接:https://arxiv.org/abs/2509.24948 代码:https://github.com/amap-cvlab/world-env

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
链接:https://arxiv.org/pdf/2510.00406 代码:https://github.com/VLA-RFT/VLA-RFT


四、 推理时强化学习 (Test-Time RL)

在部署阶段利用预训练的价值函数进行实时优化或纠错。

To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment (Bellman-Guided Retrials)
链接:https://arxiv.org/abs/2406.15917 代码:https://github.com/notmahi/bellman-guided-retrials

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance (V-GPS)
链接:https://arxiv.org/abs/2410.13816 代码:https://v-gps.github.io/

Hume: Introducing System-2 Thinking in Visual-Language-Action Model
链接:https://arxiv.org/abs/2505.21432 代码:https://github.com/Hume-VLA/Hume

VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search
链接:https://arxiv.org/abs/2509.22643


五、 强化学习对齐 (RL Alignment)

旨在使 VLA 策略符合人类偏好或安全约束。

GRAPE: Generalizing Robot Policy via Preference Alignment
链接:https://arxiv.org/abs/2411.19309 代码:https://github.com/GRAPE-VLA/GRAPE

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
链接:https://arxiv.org/abs/2503.03480 代码:https://safevla.github.io/


六、 其他分类 (Unclassified)

RPD: Refined Policy Distillation: From VLA Generalists to RL Experts
链接:https://arxiv.org/abs/2503.05833

总结


VLA 与 RL 的结合正处于快速爆发期。将模仿学习的大规模先验与强化学习的自进化能力相结合,是通向具身通用人工智能的关键路径。

都看到这了,点个关注再走吧🧐~

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/2/5 18:08:21

手持雷达流速仪在应急场景监测中的应用与实践

一.引文在山洪暴发、堤坝溃口、堰塞湖险情等突发性涉水应急事件中,水流流速及流量数据的快速精准获取直接关系抢险决策的科学性与生命财产安全。传统接触式测流方法受限于复杂地形与危险环境,难以满足应急场景下的时效性与安全性要求。手持雷…

作者头像 李华
网站建设 2026/2/4 22:22:16

链动2+1模式AI智能名片商城小程序:裂变过程驱动的商业新生态构建

摘要:在数字化商业浪潮中,链动21模式、AI智能名片与商城小程序的融合创新为商业发展带来新契机。本文聚焦于链动21模式AI智能名片商城小程序的裂变过程,深入剖析其内在机制、实施策略及对商业生态重塑的影响。通过实际案例验证,揭…

作者头像 李华
网站建设 2026/2/3 8:28:59

知网AIGC疑似度50%正常吗?学校要求ai率低于30%?

2025年起,高校已明确要求毕业论文要检测AIGC率,AI率高于30%或40%就不能参加答辩,而部分学校、硕士论文更加严格,要求在20%以内。 这其中,大多数高校使用的AIGC检测系统是知网、万方、维普等主流查重系统,这…

作者头像 李华
网站建设 2026/2/6 5:30:55

大学生必备!6款免费AI工具一键搞定毕业论文,自动改重、高级表达全在行!

还在为毕业论文熬夜秃头、反复修改、查重率居高不下而烦恼吗?告别低效和焦虑的时刻到了。本文将为你揭晓一个由专业学术写作专家精心评测的 “毕业论文AI工具终极清单”。我们深度体验了市面上数十款工具,最终为你筛选出 6款真正能打、免费且高效的AI论文…

作者头像 李华
网站建设 2026/2/5 20:07:47

9 个降AI率工具,研究生高效避坑指南

9 个降AI率工具,研究生高效避坑指南 AI降重工具:让论文更自然,让学术更纯粹 在当今学术研究中,随着AI写作工具的广泛应用,论文的AIGC率问题日益凸显。许多研究生在使用AI辅助写作后,发现自己的论文存在明显…

作者头像 李华