news 2026/6/12 1:44:53

突破传统 AI 训练!USTC 提出 Role-Agent 双角色共演机制

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
突破传统 AI 训练!USTC 提出 Role-Agent 双角色共演机制

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Authors: Xucong Wang, Ziyu Ma, Shidong Yang, Tongwen Huang, Pengkun Wang, Yong Wang, Xiangxiang Chu (USTC & AMAP, Alibaba) |Year: 2026 |arXiv: 2606.10917

二、研究背景

LLM Agent 的学习受限于两个问题:(1)低效的交互反馈——传统强化学习通常只有稀疏的最终奖励;(2)静态训练环境——训练数据固定,无法针对失败模式进行针对性练习。

Role-Agent 的核心洞察:LLM 本身具有足够的世界知识,可以模拟环境动态;同时具备分析自身失败的能力,可以主动选择"练习题"。

四、实验结果

在编程、导航、知识问答等多个 Agent 基准上评测:

  • 相比强基线平均提升>4%
  • WIA 的过程奖励在长时序任务中效果尤为显著
  • AIW 的失败模式检索有效将练习集中于已知弱点

报告生成时间:2026-06-11 | 论文来源:arXiv:2606.10917

原文摘要:Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, \textcolor{black}{a framework} that harnesses a single LLM to function concurrently as both the agent and the environment, enabling a bootstrapped co-evolution. Role-Agent comprises two synergistic components: World-In-Agent (WIA) and Agent-In-World (AIW). In WIA, the LLM acts as the agent and predicts future states after each action; the alignment between predicted and actual states is then used as a process reward, encouraging environment-aware reasoning. In AIW, the LLM analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. Experiments on multiple benchmarks show that Role-Agent consistently improves performance, yielding an average gain of over 4% over strong baselines.

PDF链接:https://arxiv.org/pdf/2606.10917v1

部分平台可能图片显示异常,请以我的博客内容为准

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/12 1:40:22

MATLAB版经典光流法实现:含可直接运行的配准函数与可视化示例

本文还有配套的精品资源,点击获取 简介:这个MATLAB资源包实现了Lucas-Kanade等经典光流算法,专注两帧灰度图像间的像素级运动估计与对齐。主函数class_optical_flow.m支持输入尺寸一致的双精度图像,输出u/v方向光流分量&#x…

作者头像 李华