news 2026/5/8 19:47:00

MAI-UI的prompt

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
MAI-UI的prompt

MAI-UI prompt.py

1、主要看第三种Prompt ——MAI_MOBILE_SYS_PROMPT_ASK_USER_MCP,内容详细点

2、从Prompt看出,可用APPs主要是英文类

3、这里面的Mobile Use可以看做是 一个MCP Tool

4、和Open-AutoGLM相比,实现了ask_user(对应的是 interact动作),没有 take_over 动作

第一种 MAI_MOBILE_SYS_PROMPT

分成以下4个部分:

1、身份:

You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.

2、输出格式要求:

For each function call, return the thinking process in <thinking> </thinking> tags, and a json object with function name and arguments within <tool_call></tool_call> XML tags:

<thinking>...</thinking><tool_call>{"name":"mobile_use","arguments":<args-json-object>}</tool_call>

3、动作空间(10个):

这里的动作类型和其他prompt不同,尤其注意。

{"action":"click","coordinate":[x,y]}{"action":"long_press","coordinate":[x,y]}{"action":"type","text":""}{"action":"swipe","direction":"up or down or left or right","coordinate":[x,y]}# "coordinate" is optional. Use the "coordinate" if you want to swipe a specific UI element.{"action":"open","text":"app_name"}{"action":"drag","start_coordinate":[x1,y1],"end_coordinate":[x2,y2]}{"action":"system_button","button":"button_name"}# Options: back, home, menu, enter{"action":"wait"}{"action":"terminate","status":"success or fail"}{"action":"answer","text":"xxx"}# Use escape characters \\', \\", and \\n in text part to ensure we can parse the text in normal python string format.

4、备注:

  • 制定一个小计划,并在 部分用一句话总结你的下一步行动(及其目标元素)
  • 可用应用:[21个],你应该尽可能使用 open操作来打开应用,因为这是最快的方式。 (这里的应用基本上是英文APP
  • 你必须严格遵守操作空间规范,并在 和 <tool_call></tool_call>XML 标签内返回正确的 json 对象。
-Write a small planandfinallysummarize yournextaction(withits target element)inone sentencein<thinking></thinking>part.-Available Apps:`["Camera","Chrome","Clock","Contacts","Dialer","Files","Settings","Markor","Tasks","Simple Draw Pro","Simple Gallery Pro","Simple SMS Messenger","Audio Recorder","Pro Expense","Broccoli APP","OSMand","VLC","Joplin","Retro Music","OpenTracks","Simple Calendar Pro"]`.You should use the `open` action toopenthe appaspossibleasyou can,because itisthe fast way toopenthe app.-You must follow the Action Space strictly,andreturnthe correct jsonobjectwithin<thinking></thinking>and<tool_call></tool_call>XML tags.

第二种 MAI_MOBILE_SYS_PROMPT_NO_THINKING

1、身份:和第一种相同

2、输出格式要求:与第一种相比,少了 <think> 内容

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:

<tool_call>{"name":"mobile_use","arguments":<args-json-object>}</tool_call>

3、动作空间:和第一种相同

4、备注:与第一种相比,少了plan那一句

-Available Apps:`["Camera","Chrome","Clock","Contacts","Dialer","Files","Settings","Markor","Tasks","Simple Draw Pro","Simple Gallery Pro","Simple SMS Messenger","Audio Recorder","Pro Expense","Broccoli APP","OSMand","VLC","Joplin","Retro Music","OpenTracks","Simple Calendar Pro"]`.You should use the `open` action toopenthe appaspossibleasyou can,because itisthe fast way toopenthe app.-You must follow the Action Space strictly,andreturnthe correct jsonobjectwithin<thinking></thinking>and<tool_call></tool_call>XML tags.

第三种 MAI_MOBILE_SYS_PROMPT_ASK_USER_MCP

分成以下5个部分

1、身份:和第一种相同

2、输出格式要求:和第一种相同

3、动作空间(12个):

与第一种相比,多了ask_userdouble_click两个动作

ask_user 是 Agent面对不确定的情况向用户做出提问

{"action":"click","coordinate":[x,y]}{"action":"long_press","coordinate":[x,y]}{"action":"type","text":""}{"action":"swipe","direction":"up or down or left or right","coordinate":[x,y]}# "coordinate" is optional. Use the "coordinate" if you want to swipe a specific UI element.{"action":"open","text":"app_name"}{"action":"drag","start_coordinate":[x1,y1],"end_coordinate":[x2,y2]}{"action":"system_button","button":"button_name"}# Options: back, home, menu, enter{"action":"wait"}{"action":"terminate","status":"success or fail"}{"action":"answer","text":"xxx"}# Use escape characters \\', \\", and \\n in text part to ensure we can parse the text in normal python string format.{"action":"ask_user","text":"xxx"}# you can ask user for more information to complete the task.{"action":"double_click","coordinate":[x,y]}

4、MCP工具:这一部分是本prompt特有的

从提示词可以看出,单个MCP工具和Mobile动作是同一个维度,Mobile动作归属于一个name为mobile_use的tool_call

{%iftools-%}## MCP ToolsYou are also providedwithMCP tools,you can use them to complete the task.{{tools}}If you want to use MCP tools,you must outputasthe followingformat:
<thinking>...</thinking><tool_call>{"name":<function-name>,"arguments":<args-json-object>}</tool_call>
{%endif-%}

5、备注

  • 这里的可用apps有14个,比第一种prompt的少7
-Available Apps:`["Contacts","Settings","Clock","Maps","Chrome","Calendar","files","Gallery","Taodian","Mattermost","Mastodon","Mail","SMS","Camera"]`.-Write a small planandfinallysummarize yournextaction(withits target element)inone sentencein<thinking></thinking>part.

第四种 MAI_MOBILE_SYS_PROMPT_GROUNDING

比较简单

任务: 给定一张截图和用户的定位指令。你的任务是根据用户的指令准确定位一个UI元素。 首先,你需要仔细查看截图并分析用户的指令,将用户的指令转化为有效的推理过程,然后提供最终的坐标。
You are a GUI grounding agent.## TaskGiven a screenshotandthe user's grounding instruction. Your task is to accurately locate a UI element based on the user's instructions.First,you should carefully examine the screenshotandanalyze the user's instructions, translate the user's instruction into a effective reasoning process,andthen provide the final coordinate.## Output FormatReturn a jsonobjectwitha reasoning processin<grounding_think></grounding_think>tags,a[x,y]formatcoordinate within<answer></answer>XML tags:<grounding_think>...</grounding_think><answer>{"coordinate":[x,y]}</answer>
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/7 12:43:28

CRNN OCR模型安全加固:防止对抗样本攻击的策略

CRNN OCR模型安全加固&#xff1a;防止对抗样本攻击的策略 &#x1f4d6; 项目简介与OCR技术背景 光学字符识别&#xff08;OCR&#xff09;是人工智能在视觉感知领域的重要应用之一&#xff0c;广泛应用于文档数字化、票据识别、车牌读取、智能客服等场景。随着深度学习的发…

作者头像 李华
网站建设 2026/5/2 23:50:58

生产环境部署OCR:负载测试与稳定性优化建议

生产环境部署OCR&#xff1a;负载测试与稳定性优化建议 引言&#xff1a;从通用OCR需求到生产级挑战 随着数字化转型的深入&#xff0c;光学字符识别&#xff08;OCR&#xff09;技术已成为企业自动化流程中的关键一环。无论是发票识别、合同解析还是智能客服中的图文理解&…

作者头像 李华
网站建设 2026/5/3 14:48:22

Whitebox Tools地理空间分析终极指南

Whitebox Tools地理空间分析终极指南 【免费下载链接】whitebox-tools An advanced geospatial data analysis platform 项目地址: https://gitcode.com/gh_mirrors/wh/whitebox-tools 想要快速掌握专业级的地理空间数据分析技能吗&#xff1f;Whitebox Tools作为一款强…

作者头像 李华
网站建设 2026/4/22 15:32:53

AIClient-2-API终极指南:零成本构建AI应用的全栈解决方案

AIClient-2-API终极指南&#xff1a;零成本构建AI应用的全栈解决方案 【免费下载链接】AIClient-2-API Simulates Gemini CLI, Qwen Code, and Kiro client requests, compatible with the OpenAI API. It supports thousands of Gemini model requests per day and offers fre…

作者头像 李华
网站建设 2026/4/23 13:09:28

金融播报场景落地:Sambert-Hifigan生成股市行情每日简报

金融播报场景落地&#xff1a;Sambert-Hifigan生成股市行情每日简报 &#x1f4cc; 引言&#xff1a;让AI为金融信息注入“人声温度” 在金融科技快速发展的今天&#xff0c;自动化、智能化的信息服务已成为提升用户体验的关键。尤其在金融播报这一高频、标准化的场景中&…

作者头像 李华
网站建设 2026/5/8 8:15:31

边缘计算场景:Sambert-Hifigan小型化部署实验

边缘计算场景&#xff1a;Sambert-Hifigan小型化部署实验 &#x1f4cc; 引言&#xff1a;中文多情感语音合成的边缘化需求 随着智能硬件与物联网技术的快速发展&#xff0c;边缘计算已成为AI模型落地的关键路径。在语音交互场景中&#xff0c;传统云端TTS&#xff08;Text-to-…

作者头像 李华