突破传统 AI 训练！USTC 提出 Role-Agent 双角色共演机制-洪萨配资

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Authors: Xucong Wang, Ziyu Ma, Shidong Yang, Tongwen Huang, Pengkun Wang, Yong Wang, Xiangxiang Chu (USTC & AMAP, Alibaba) |Year: 2026 |arXiv: 2606.10917

二、研究背景

LLM Agent 的学习受限于两个问题：(1)低效的交互反馈——传统强化学习通常只有稀疏的最终奖励；(2)静态训练环境——训练数据固定，无法针对失败模式进行针对性练习。

Role-Agent 的核心洞察：LLM 本身具有足够的世界知识，可以模拟环境动态；同时具备分析自身失败的能力，可以主动选择"练习题"。

四、实验结果

在编程、导航、知识问答等多个 Agent 基准上评测：

相比强基线平均提升>4%
WIA 的过程奖励在长时序任务中效果尤为显著
AIW 的失败模式检索有效将练习集中于已知弱点

报告生成时间：2026-06-11 | 论文来源：arXiv:2606.10917

原文摘要:Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, \textcolor{black}{a framework} that harnesses a single LLM to function concurrently as both the agent and the environment, enabling a bootstrapped co-evolution. Role-Agent comprises two synergistic components: World-In-Agent (WIA) and Agent-In-World (AIW). In WIA, the LLM acts as the agent and predicts future states after each action; the alignment between predicted and actual states is then used as a process reward, encouraging environment-aware reasoning. In AIW, the LLM analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. Experiments on multiple benchmarks show that Role-Agent consistently improves performance, yielding an average gain of over 4% over strong baselines.

PDF链接:https://arxiv.org/pdf/2606.10917v1

部分平台可能图片显示异常，请以我的博客内容为准

OpenSpeedy终极指南：免费开源的游戏变速工具，轻松突破游戏帧率限制

OpenSpeedy终极指南：免费开源的游戏变速工具，轻松突破游戏帧率限制【免费下载链接】OpenSpeedy 🎮 An open-source game speed modifier. 项目地址: https://gitcode.com/gh_mirrors/op/OpenSpeedy 你是否厌倦了游戏中那些无聊的跑图…