GPU_GUARD_MONOREPO/docs/superpowers/plans/2026-05-14-neta-desktop-op.md
2026-05-20 21:39:12 +08:00

1519 lines
71 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Neta Desktop Op 实施计划 v4
> For agentic workers: REQUIRED SUB-SKILL — superpowers:subagent-driven-development(recommended)or superpowers:executing-plans
**Status:** Draft **v4** 2026-05-14(双 Agent 架构 + 桌面操作 Tool 化 + 删除 v3 helper / replyToGroup)
**Goal:** 双 Agent 桌面 GUI 自动化端到端落地。`channel.agentId`(reply agent)通过 `delegate_task` 委托 `channel.config.weixinReply.desktopAgentId`(desktop agent,以 NetaClaw subagent 形式执行);desktop agent 通过 `weixin_send_text` tool(toolset='weixin_desktop')调 `DesktopOpService.runAndWait` 同步完成桌面键鼠 + VLM 验证。MVP 实现 WeixinAdapter + weixin_send_text;Layer 2 加 Excel/Browser Adapter + 新 toolset 零拆架构。
**Architecture:** Node backend 进程内嵌。`modules/desktop_op/` 通用 Runtime + AppAdapter 注册式 + 全局 DesktopMutex + SafetyGuard。**不引入 WeixinReplyHelper**(v3 已 plan 但 v4 决定删除多余抽象,tool 直接调 service)。Model 走 desktop agent 自己的 `agent.modelChannelId`(因为 desktop agent 是普通 NetaClaw Agent)。
**Tech Stack:** Midway.js 3.20 + TypeORM(MySQL,读 model_channel + 写 desktop_op_action_log + 全局 desktop_op_config)+ `node-screenshots` + `@nut-tree-fork/nut-js` + `koffi` + `clip.exe`(child_process,**不用 clipboardy v5**)+ `openai` npm(OpenAI 兼容)。
**Spec:** `docs/superpowers/specs/2026-05-14-neta-desktop-op-design.md` v4
**v3 → v4 主要变更(8 项,来自双 Agent 架构决议,见 spec §0 v4 row):**
-**H1 频道绑 2 个 agent**:`channel.agentId`(reply) + `channel.config.weixinReply.desktopAgentId`(desktop,★ 新)
-**H2 桌面操作以 Tool 形式暴露**:新 toolset='weixin_desktop',tool `weixin_send_text` 注册到 `modules/netaclaw/tools/builtin/`
-**H3 删除 `weixin_db.replyToGroup` 整方法**(占位)+ **删除 `agent_channel.ts:585-608` 自动发送块**;reply agent 必须主动 `delegate_task` 才能发出消息
-**H4 Tool 同步等待**:新增 `DesktopOpService.runAndWait(task, 60s)` 接口(替代 v3 的 enqueue fire-and-forget 链路;enqueue 仍保留供 Layer 2 后台任务)
-**H5 不引入 `WeixinReplyHelper`**:v3 Plan Task 14 取消,tool execute 直接组装 DesktopTask 调 service
-**H6 Desktop agent 必须由管理员显式创建**:toolset=`weixin_desktop` + `interaction`,modelChannel 选 multimodal,prompt 用默认模板
-**H7 防 Loop**:desktop agent toolset **不含 `crew`**(后端校验),reply agent toolset **必须含 `crew`**(`delegate_task` 注册在 `crew`)
-**H8 移除 `channel.config.weixinReply.modelChannelId`**:VLM 模型从 desktop agent 自己的 `agent.modelChannelId`
-**bizContext 透传**:扩展 `NetaToolRuntimeContext``bizContext.channelId/roomName`,`agent_channel` 注入 → subagent 继承 → tool 自动可读
-**Phase 0.5 新增**:Subagent IPC PoC(验证 tool 在哪个 process 跑,决定 DesktopMutex 是否需要跨进程)
**沿用 v3 决议(不变):**
- ✅ Phase 0 PoC 已通过(100% 1 次,门禁解除)
- 模块在 `modules/desktop_op/`(顶层)
- Task schema `DesktopTask{appId, target, actionType, params}`
- AppAdapter 注册式,MVP 唯一 WeixinAdapter
- 全局 DesktopMutex(待 Phase 0.5 PoC 验证 IPC 后定单进程 vs 跨进程)
- SafetyGuard(白名单 + 黑名单)
- desktop_op_config 全局表(★ 移除 default_model_channel_id 字段)
- clip.exe 代替 clipboardy
- 截图每次 enumerate
- node-screenshots appName='Weixin' 取最大窗口
- 默认 Parser:JsonActionParser
**关键约束:**
- 每 Task 一个 commit(★ v4 调整:用户明确 git 不需要提交,subagent 实施时跳过 commit 步骤)
- TDD:先测试再实现
- 单元测试可在 Linux/Mac 跑(原生模块在 platform=win32 才走真实路径)
-**weixin-archive 监听链路 + weixin_db 读路径(bindChannel/WalWatcher/IncrementalReader/health)完全不动**(用户明确)
-**weixin_db.replyToGroup 整方法删除**;**agent_channel.ts:585-608 自动发送块删除**(v4 新增)
- ★ reply agent 主动 `delegate_task` 才能发消息(系统不自动发)
- model 走 desktop agent 自己的 `agent.modelChannelId`,不硬编码,不再走 channel.config 配
- 通用化但 MVP 只实现 WeixinAdapter + `weixin_send_text` tool,其他 Adapter / tool 留 Layer 2
- **截图必须每次重新 enumerate**(PoC 暴露 node-screenshots 缓存)
- **找微信窗口用 node-screenshots `appName==='Weixin'` 取最大**(PoC 暴露 FindWindow 找到子窗口)
-**Phase 0.5 PoC(subagent IPC 验证)是 Phase A Task 5 的前置门禁**
**前置依赖:**
- weixin-archive sync 已合并(读路径已有)
- `netaclaw_model_channel` 已有火山引擎 multimodal 配置(id=2 已就绪,Phase 0 验过)
- **测试小号已养号 ≥ 7 天 + 测试群 ≥ 5 人**(立项当天启动并行 timeline)
- 项目能跑 `pnpm --filter @neta/backend test`
---
## 文件结构
### 新增
#### `modules/desktop_op/`(通用桌面 Agent,与 netaclaw 平级)
| 文件 | 责任 |
|---|---|
| `runtime/types.ts` | DesktopTask / ActionStep / TaskResult / AdapterContext |
| `runtime/dpi.ts` | DPI Aware bootstrap |
| `runtime/screenshot.ts` | 截屏(★ 每次 enumerate,不缓存)|
| `runtime/window_locator.ts` | 通用窗口定位(用 node-screenshots `appName` + bounds,**不依赖 FindWindow**)|
| `runtime/input.ts` | 键鼠 nut.js + ★ clip.exe 写中文剪贴板 |
| `runtime/desktop_mutex.ts` | ★ 全局键鼠锁(替代 v2 的 WeixinChannelMutex)|
| `runtime/safety_guard.ts` | ★ 应用白名单 + 危险按键/动作硬黑名单 |
| `runtime/rate_limiter.ts` | per-app / per-target / daily |
| `runtime/parser/parser.ts` | Parser interface |
| `runtime/parser/json_action_parser.ts` | ★ MVP 默认(Seed 2.0 Pro 输出 JSON)|
| `runtime/parser/registry.ts` | 按 modelChannel.providerType 选 |
| `runtime/adapters/adapter.ts` | ★ AppAdapter interface |
| `runtime/adapters/weixin_adapter.ts` | ★ MVP 唯一实现 |
| `runtime/adapters/registry.ts` | ★ 按 task.appId 选 |
| `runtime/vlm_client.ts` | OpenAI 兼容多模态(走 model_channel) |
| `runtime/action_executor.ts` | 派发 ActionStep |
| `runtime/runtime.ts` | DesktopOpRuntime.runTask(adapter 主导) |
| `service/desktop_op.ts` | DesktopOpService: enqueue / per-app worker / abortByFilter |
| `entity/desktop_op_action_log.ts` | 审计 entity(通用 schema,微信场景填 channel_id/room_name) |
| `entity/desktop_op_config.ts` | ★ 全局配置 entity(单行) |
| `controller/admin/desktop_op_action_log.ts` | /list /info |
| `controller/admin/desktop_op_config.ts` | get / update |
| 对应测试 + fixtures | 见各 Task |
#### `modules/netaclaw/`(★ v4 改:Tool 替代 helper)
| 文件 | 责任 |
|---|---|
| `tools/builtin/weixin_send_text.ts` | ★ v4 新增 — toolset='weixin_desktop' 的 tool,execute 直接组装 DesktopTask 调 `DesktopOpService.runAndWait` |
| `tools/runtime_context.ts` | ★ v4 修改 — 扩展 NetaToolRuntimeContext 加 `bizContext` 字段(channelId/roomName) |
| `tools/catalog.ts` | ★ v4 修改 — `import './builtin/weixin_send_text.js'` + 加 `TOOLSET_WEIXIN_DESKTOP` 常量 |
**v3 plan 的 `service/weixin_reply_helper.ts` 不创建**(v4 决定:tool 直接调 service,无中间 helper)
#### `tools/` 和 PoC 产出物
| 文件 | 状态 |
|---|---|
| `tools/visual_agent_probe/run-once.ts` | ✅ 已存在(Phase 0 PoC,通过)|
| `tools/visual_agent_probe/README.md` | ✅ 已存在 |
| `tools/visual_agent_probe/check-wechat.ps1` | ✅ 已存在 |
| `tools/visual_agent_probe/debug/*.png` | Phase 0 截图,不入 git(.gitignore)|
| `docs/superpowers/followups/2026-05-14-visual-agent-poc-raw.json` | ✅ 已存在,Phase 0 raw 报告 |
### 修改
| 文件 | 改动 |
|---|---|
| `packages/backend/package.json` | 加 deps(node-screenshots / nut-tree-fork/nut-js / koffi,**不加 clipboardy**)— ★ 检查已存在 |
| `packages/backend/src/configuration.ts` | onReady 调 `ensureDpiAware()` + 加载 SafetyGuard config |
| `packages/backend/src/modules/netaclaw/service/weixin_db.ts` | ★ **删除 `replyToGroup` 整方法**(行 166-171)+ 修改类注释删去 "5.7 占位" 那行 |
| `packages/backend/src/modules/netaclaw/service/agent_channel.ts` | ★ **删除 weixin-db 自动发送块**(行 585-608 整段)+ delete(ids) cascade 调 `desktopOpService.abortByFilter(t => t.appId==='weixin' && t.target.channelId === id, 'channel-deleted')` + update 时校验双 agent toolset + 自动配 weixin_send_text 的 workerRoutingStrategy + 注入 bizContext/currentAgent + reply agent 漏 delegate_task 检测告警 |
| `packages/backend/src/modules/netaclaw/service/subagent.ts` | ★ subagent runPreparedExecution 时继承 parent runtime.bizContext + 替换 currentAgent 为 subagent 自己;`NetaClawSubagentRunSingleContext` 加 optional `parentRuntime` 字段;in_process 模式直接透传给 agentRunner |
| `packages/backend/src/modules/netaclaw/runtime/agent.ts` | ★ `AgentRunParams` 加 optional `runtime?: NetaToolRuntimeContext`,内部透传到 beforeToolCall |
| `packages/backend/src/modules/netaclaw/service/agent_executor.ts` | ★ `beforeToolCall` 用 params.runtime(若有)调 `injectToolRuntimeContext`(替代硬编码 {sessionCwd, workspaceRoots}) |
| `packages/backend/src/modules/netaclaw/subagent/process_runner.ts` | ★ subprocess 模式:`SubagentRunRequest` envelope 加 `runtime?` 字段,发送前 `JSON.stringify(runtime)` 校验(失败 throw 'biz-context-not-serializable');worker 端 attach 到内部 agent_executor |
| `packages/backend/src/modules/netaclaw/tools/builtin/delegate_task.ts` | ★ session-subagent 分支调 `ctx.runSingle` 时透传当前 parent 的 runtime(SessionDelegateToolContext 需扩展 currentRuntime 字段) |
| `packages/backend/src/modules/netaclaw/service/tool_resolver.ts` | ★ `SessionDelegateToolContext``currentRuntime?: NetaToolRuntimeContext`,`createSessionDelegateToolContext` 在构造时填充 |
| `packages/backend/src/modules/netaclaw/tools/runtime_context.ts` | ★ 扩展 `NetaToolRuntimeContext``bizContext: NetaToolRuntimeBizContext` + `currentAgent: NetaToolRuntimeCurrentAgent`;`injectToolRuntimeContext` 校验 JSON-safe |
| `packages/backend/src/modules/netaclaw/tools/catalog.ts` | ★ `import './builtin/weixin_send_text.js'` + 加 `TOOLSET_WEIXIN_DESKTOP` 常量 |
| `packages/backend/src/entities.ts` | 自动生成,新建 entity 后 `cool entity` 重生(不手改)|
| `packages/frontend/src/modules/agent/views/channel-edit.vue` | ★ 加"微信自动回复"区块,**2 个 agent 下拉**(对话 Agent + 桌面操作 Agent)+ watermark + 风控 |
| `docs/superpowers/specs/2026-05-09-wechat-uia-channel-design.md` | 顶部 OBSOLETE banner |
### ★ 不动
| 文件 | 原因 |
|---|---|
| `modules/netaclaw/service/weixin_archive_sync.ts` | 用户明确:聊天记录同步走 DB 不动。内部 `channelLocks` Map 保留(防同 channel 并发归档),**不与 DesktopMutex 合并** |
| `modules/netaclaw/runtime/weixin_db/*` | 同上 — 监听链路完全不动 |
| `weixin_db.ts` 的 bindChannel / unbindChannel / getRuntime / healthCheck / probeAlive / refreshWhitelist 等读路径方法 | ★ v4 明确:只删 replyToGroup,其他读路径方法保留 |
| `agent_channel.ts` 的 routeInboundMessage / handleInboundMessage 主体逻辑 | 不动主框架,只在 weixin-db 分支删自动发送块 + delete 时加 cascade + 注入 bizContext |
| `tools/builtin/delegate_task.ts` + `service/subagent.ts` 主体 | NetaClaw 现有 subagent 机制完全复用,**不增强 delegate_task 协议**(只增强 bizContext 继承) |
---
## Phase 0 · PoC 验真 ✅ 已通过
### Task 0.1 ✅ 已完成
**Status:** Phase 0 PoC 已在 2026-05-14 实跑通过(100% 1 次成功,核心链路打通)。
**已 commit 产出:**
- `tools/visual_agent_probe/run-once.ts`(独立脚本,无 IoC)
- `tools/visual_agent_probe/README.md`
- `tools/visual_agent_probe/check-wechat.ps1`
- `docs/superpowers/followups/2026-05-14-visual-agent-poc-raw.json`(raw 报告)
- `tools/visual_agent_probe/debug/*.png`(PoC 截图)
**实测发现的坑(已反映到 Phase A 各 Task):**
- 微信主窗口 title 是中文 "微信"(英文 "Weixin" 是子窗口)→ `Task 6b` 改用 node-screenshots `appName==='Weixin'` 取最大,不用 `FindWindowW`
- `node-screenshots` Image 缓存,两次截图同一字节 → `Task 6a` 必须每次重新 enumerate
- `clipboardy` v5 是 ESM,CJS require 失败 → `Task 6c``child_process.spawnSync('clip.exe', { input: utf16leBomBuf })`
- 微信 Ctrl+F 全局搜索首项常是"公众号"非目标对话 → `WeixinAdapter` MVP 要求用户手动定位
### Task 0.2: 跑 N=20 稳定性验证(可选)
**Files:**
- Append: `docs/superpowers/followups/2026-05-14-visual-agent-poc-raw.json`
- [ ] **Step 1:**`pnpm exec tsx tools/visual_agent_probe/run-once.ts 20`(N=20),记录成功率
- [ ] **Step 2:** 把 20 次的 raw VLM 输出剪出来,**Phase B Task 3 fixtures 直接用**
- [ ] **Step 3:** 报告 ≥ 80%(预期 ≥ 90%)。若失败率高,在 followup 报告里记失败模式分类(导航失败 / VLM 看不到 / Enter 没生效 / etc)
(此 task 不阻塞 Phase A,可在 Phase A 实施期间并行做。)
---
## Phase 0.5 · Subagent IPC 验证 PoC(★ v4 新增,< 30 min)
**目的**:验证 NetaClaw subagent 调用 tool 时,tool execute 跑在 parent process 还是 subagent process,以决定 DesktopMutex 是单进程实例还是要跨进程方案。
### Task 0.5.1: 临时 `_debug_pid` tool
**Files:**
- Create(临时,验证后删): `packages/backend/src/modules/netaclaw/tools/builtin/_debug_pid.ts`
- Modify: `tools/catalog.ts` 临时 import
- [ ] **Step 1:** 写一个最简 tool:
```ts
import { Type } from '@sinclair/typebox';
import { AgentToolWithMeta, textResult } from '../common.js';
import { registerSchema } from '../catalog.js';
export const debugPidTool: AgentToolWithMeta<typeof Type.Object({}), unknown> = {
name: '_debug_pid',
label: 'Debug PID',
description: 'Return current process pid for IPC verification.',
parameters: Type.Object({}),
async execute() {
return textResult(JSON.stringify({ pid: process.pid, argv: process.argv }));
},
};
registerSchema({ name: '_debug_pid', toolset: 'debug', description: 'debug', visibility: 'tool', isCore: false, canDisable: true });
```
- [ ] **Step 2:** 后端管理后台:
- 创建 reply agent A(toolset = `crew` + `interaction`)
- 创建 desktop agent B(toolset = `debug` + `interaction`)
- 在 chat 里让 A 用 `delegate_task({mode:'preset', agentId: B.id, goal:'调用 _debug_pid'})`
- [ ] **Step 3:** 看 _debug_pid 返回的 pid。同时 console 打印 backend main pid。
- **等于** → tool 在 parent process(IPC proxy 模式) → DesktopMutex 单实例 OK,直接进 Phase A
- **不等于** → tool 在 subagent process → DesktopMutex 要改跨进程方案(file lock / 共享 SQLite),修改 Task 5 设计
- [ ] **Step 4:** 删除 _debug_pid.ts + catalog.ts 临时 import,记录结论到 `docs/superpowers/followups/2026-05-14-subagent-ipc-poc.md`
- [ ] **Step 5:** 不 commit(临时验证,产物只剩 followup 报告 + Phase A 选择)
---
## Phase A · 基础工具(可跨平台单测)
### Task 1: 依赖 + DPI Aware
**Files:**
- Modify: `packages/backend/package.json`
- Create: `modules/desktop_op/runtime/dpi.ts`
- Modify: `packages/backend/src/configuration.ts`
- [ ] **Step 1:** `cd packages/backend && pnpm add node-screenshots @nut-tree-fork/nut-js koffi`
(★ **不加 clipboardy** — v5 是 ESM,改用 child_process spawn clip.exe)
- [ ] **Step 2:** 写 `dpi.ts`:`ensureDpiAware()` 调 `SetProcessDpiAwarenessContext(-4)`,platform !== 'win32' no-op,失败不抛错
- [ ] **Step 3:** `configuration.ts onReady` 调
- [ ] **Step 4:** Commit
```
git commit -m "feat(desktop-op): deps + DPI Aware bootstrap"
```
---
### Task 2: types.ts(★ v3 通用化)
**Files:**
- Create: `modules/desktop_op/runtime/types.ts`
- [ ] **Step 1:** 写:
```ts
export interface WindowHandle {
hwnd: number;
pid: number;
appName: string;
title: string;
bounds: { x: number; y: number; width: number; height: number };
nsWindow: any; // node-screenshots Window 实例引用, 截图用
}
export interface DesktopTask {
id: string;
appId: string; // 'weixin' / 'excel' / ...
target: any; // adapter 自定义,e.g. { conversation, channelId, roomName }
actionType: string; // e.g. 'send-text'
params: any; // e.g. { text, originalText }
modelChannelId?: number; // 不填则用 desktop_op_config.default
maxSteps?: number; // 默认 8
enqueuedAt: number;
}
export type ActionStep =
| { type: 'click'; x: number; y: number; thought?: string }
| { type: 'hotkey'; key: string; thought?: string }
| { type: 'clipboard-write'; text: string } // 写剪贴板, 不按键
| { type: 'type'; text: string; thought?: string } // = clipboard-write + ctrl+v
| { type: 'wait'; ms: number }
| { type: 'mention'; wxid: string } // 留口
| { type: 'finished'; thought?: string }
| { type: 'failed'; reason: string; thought?: string };
export interface TaskResult {
ok: boolean;
modelCalls: number;
steps: number;
durationMs: number;
}
export interface AdapterContext {
window: WindowHandle;
screenshot: any; // Screenshooter
input: any; // InputController
vlm: any; // VlmClient
parser: any; // Parser
logger: any;
task: DesktopTask;
modelCalls: number;
}
```
- [ ] **Step 2:** Commit
```
git commit -m "feat(desktop-op): types(DesktopTask / ActionStep / AdapterContext)"
```
---
### Task 3: Parser interface + JsonActionParser(TDD + fixtures)
**Files:**
- Create: `modules/desktop_op/runtime/parser/parser.ts`
- Create: `modules/desktop_op/runtime/parser/json_action_parser.ts`
- Create: `modules/desktop_op/runtime/parser/registry.ts`
- Create: `test/.../parser/json_action_parser.test.ts`
- Create: `test/fixtures/desktop_op/vlm_responses/*.txt`(★ 来自 PoC `2026-05-14-visual-agent-poc-raw.json` + Task 0.2 N=20)
> v2 默认 UI-TARS DSL Parser,v3 改 **JsonActionParser**(PoC 实测 Seed 2.0 Pro 输出 JSON 可解析,UI-TARS DSL Parser 留 Layer 2 加)。
- [ ] **Step 1:** 从 PoC raw 报告抽 ≥ 20 条 VLM 输出(success / failed / 边界),命名 `success-N.txt` / `failed-N.txt` / `ambiguous-N.txt`
- [ ] **Step 2:** 写 `parser.ts`:
```ts
import type { ActionStep, DesktopTask, AdapterContext } from '../types.js';
export interface Parser {
buildSystemPrompt(task: DesktopTask, ctx?: AdapterContext): string;
parseAction(raw: string): ActionStep;
buildVerifyPrompt(question: string): string;
parseVerify(raw: string): boolean;
}
```
- [ ] **Step 3:** 写 `json_action_parser.ts`:
- parseAction:容错 markdown 代码块、单行尾 JSON、`{ "type": "click", "x": ..., "y": ..., "reason": ... }` 等格式
- parseVerify:模型可能输出 `{"type":"finished",...}` 或自然语言含 yes/no → 都识别为 true/false
- 解析失败一律返回 `{ type: 'failed', reason: 'parse-error: <raw>' }`
- [ ] **Step 4:** 写测试,每个 fixture 一个 case:
```ts
for (const file of fs.readdirSync(fixturesDir)) {
it(file, () => {
const a = parser.parseAction(fs.readFileSync(...));
if (file.startsWith('success-')) expect(['click','hotkey','type','finished','wait']).toContain(a.type);
else if (file.startsWith('failed-')) expect(a.type).toBe('failed');
});
}
```
- [ ] **Step 5:** 写 `registry.ts`:
```ts
const PARSERS = new Map<string, Parser>([
['volcengine', new JsonActionParser()], // doubao-seed-2-0-pro / Seed-1.6-vision
['volces-uitars', new JsonActionParser()], // 兼容,UI-TARS Layer 2 再加 UITarsParser
]);
export function getParser(supplierOrProvider: string): Parser {
return PARSERS.get(supplierOrProvider) ?? new JsonActionParser(); // 默认 fallback
}
```
- [ ] **Step 6:** 跑测试
- [ ] **Step 7:** Commit
```
git commit -m "feat(desktop-op): Parser interface + JsonActionParser + fixtures"
```
---
### Task 4: rate_limiter.ts(TDD)
**Files:**
- Create: `modules/desktop_op/runtime/rate_limiter.ts`
- Create: 测试
- [ ] **Step 1:** 写测试:per-target / per-app / daily 三维度
- [ ] **Step 2:** 实现 in-memory token bucket:
```ts
class RateLimiter {
tryAcquire(appId: string, targetKey: string, opts: {
perTargetPerMin?: number;
perAppPerMin?: number;
perAppPerDay?: number;
}): { allowed: boolean; reason?: string };
}
```
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): rate_limiter(per-app / per-target / daily)"
```
---
### Task 5: desktop_mutex.ts(★ v3 全局键鼠锁,v4 单进程默认 — 待 Phase 0.5 PoC 确认)
**Files:**
- Create: `modules/desktop_op/runtime/desktop_mutex.ts`
- Create: 测试
> ⚠️ 与 v2 不同:**不再叫 WeixinChannelMutex,不再 per-channel,改为全局键鼠锁**。理由:系统只有一对键鼠/一块屏幕,任意时刻一个 task 占前台。
> ⚠️ `weixin_archive_sync.ts` **不动** — 它继续用自己的 channelLocks Map(只读 SQLite,与桌面锁无关)。
- [ ] **Step 1:** 写测试:
- acquire 多个 task → 严格串行
- release 后下一个 waiter 立即获得
- acquire 后 abort → 不阻塞后续 waiter
- [ ] **Step 2:** 实现 `DesktopMutex`(@Provide + @Scope Singleton):
```ts
class DesktopMutex {
acquire(taskId: string, appId: string): Promise<() => void>;
}
```
内部 `busy: {taskId, appId} | null` + `waiters: Array<{resolve, taskId, appId}>`,见 spec §3.5。
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): DesktopMutex 全局键鼠锁"
```
---
### Task 5.5: safety_guard.ts(★ v3 新增)
**Files:**
- Create: `modules/desktop_op/runtime/safety_guard.ts`
- Create: 测试
- [ ] **Step 1:** 写测试:
- validateAppId('weixin') 通过,validateAppId('cmd') 抛 `app-not-allowed`
- validateAction({type:'hotkey',key:'delete'}) 抛 `dangerous-key-blocked`
- validateAction({type:'hotkey',key:'win+r'}) 抛
- validateAction({type:'hotkey',key:'ctrl+v'}) 通过
- loadConfig 后白名单更新
- [ ] **Step 2:** 实现 `SafetyGuard`(见 spec §3.6):
- 默认 `allowedApps = ['weixin']`
- 默认 `dangerousKeys = ['delete','win+r','alt+f4','win+l','ctrl+alt+delete','ctrl+shift+esc']`
- `validateAppId / validateAction / validateTaskShape / loadConfig`
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): SafetyGuard(白名单 + 危险按键黑名单)"
```
---
### Task 6a: screenshot.ts(★ v3 修订:每次 enumerate)
**Files:**
- Create: `modules/desktop_op/runtime/screenshot.ts`
> PoC 实测发现 `Window.captureImageSync` 在同一 Window 实例上连续调返回**完全相同字节数**的 PNG(415269 bytes,两次完全一致),证明缓存。**必须每次重新 `Window.all()` 拿新 Window 实例**。
- [ ] **Step 1:** 实现:
```ts
class NodeScreenshooter {
/** ★ 每次调都重新 enumerate, 不缓存 Window 实例 */
captureWindowByAppName(appName: string, opts?: { skipMinimized?: boolean; largest?: boolean }): Buffer;
captureFullScreen(): Buffer;
}
```
- 内部用 `node-screenshots.Window.all()` 过滤 `appName === target.appName && !isMinimized`,按 area DESC 取首个
- `image.toPngSync()` 返回 PNG bytes
- platform !== 'win32' 抛 `unsupported-platform`
- [ ] **Step 2:** 不写单测(原生模块,Phase H E2E 验)
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): NodeScreenshooter(每次 enumerate, 避缓存)"
```
---
### Task 6b: window_locator.ts(★ v3 修订:不用 FindWindowW)
**Files:**
- Create: `modules/desktop_op/runtime/window_locator.ts`
> PoC 实测 `FindWindowW(null, 'Weixin')` 返回的是子窗口 hwnd=4131796(标题 'Weixin' 的子窗口),而**真正的主窗口 hwnd=135372 标题是中文 '微信'**。改用 node-screenshots 枚举 + appName 过滤。
> 改用 koffi 主要是 activate(SetForegroundWindow)+ ShowWindow + 检测前台窗口。
- [ ] **Step 1:** 实现:
```ts
class WindowLocator {
/** node-screenshots 枚举, 过滤 appName + 非最小化 + 面积最大 */
findByAppName(appName: string, opts?: { skipMinimized?: boolean; largest?: boolean }): WindowHandle | null;
/** koffi SetForegroundWindow + ShowWindow(SW_RESTORE) + AttachThreadInput 兜底 */
activate(handle: WindowHandle): Promise<void>;
/** koffi GetForegroundWindow + GetWindowThreadProcessId 比对 pid */
isForeground(handle: WindowHandle): boolean;
}
```
- [ ] **Step 2:** Commit
```
git commit -m "feat(desktop-op): WindowLocator(node-screenshots enumerate + koffi activate)"
```
---
### Task 6c: input.ts(★ v3 修订:clip.exe 代替 clipboardy)
**Files:**
- Create: `modules/desktop_op/runtime/input.ts`
- Create: 测试
> PoC 实测 clipboardy v5 是 ESM 包,backend CJS `require` 失败:`ERR_PACKAGE_PATH_NOT_EXPORTED`。改用 Windows 自带 `clip.exe`(child_process spawnSync,中文要 UTF-16 LE + BOM 写入 stdin)。
- [ ] **Step 1:** 写测试:
- mock `child_process.spawnSync` 断言 input buffer 含 UTF-16 BOM + UTF-16LE 编码的文本
- mock nut.js,断言 `typeViaClipboard('你好')` 先 spawn clip.exe 再调 `hotkey('ctrl+v')`
- `hotkey('ctrl+v')` 调 keyboard.pressKey/releaseKey 顺序正确
- [ ] **Step 2:** 实现:
```ts
import { mouse, keyboard, Key, Point } from '@nut-tree-fork/nut-js';
import { spawnSync } from 'node:child_process';
class InputController {
async click(x, y) { ... }
async hotkey(combo) { ... } // 'ctrl+v' / 'enter' / 'ctrl+f' / 'ctrl+1'
/** ★ 用 clip.exe 写, 而不是 clipboardy. 中文必须 UTF-16 LE + BOM. */
writeClipboard(text: string): void {
const buf = Buffer.concat([Buffer.from([0xff, 0xfe]), Buffer.from(text, 'utf16le')]);
const r = spawnSync('clip.exe', [], { input: buf });
if (r.status !== 0) throw new Error('clip.exe failed: ' + r.stderr?.toString());
}
async typeViaClipboard(text: string): Promise<void> {
this.writeClipboard(text);
await this.hotkey('ctrl+v');
}
}
```
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): InputController(nut.js + clip.exe 中文剪贴板)"
```
---
### Task 7: action_executor.ts
**Files:**
- Create: `modules/desktop_op/runtime/action_executor.ts`
- Create: 测试
- [ ] **Step 1:** 写测试 — mock InputController,每种 ActionStep 调对应方法:
- `click` → input.click(x,y)
- `hotkey` → input.hotkey(key)
- `clipboard-write` → input.writeClipboard(text)
- `type` → input.typeViaClipboard(text)
- `wait` → sleep(ms)
- `finished` / `failed` → 无副作用
- [ ] **Step 2:** 实现 `execute(step: ActionStep): Promise<void>`,switch 各类型
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): ActionExecutor"
```
---
## Phase B · AppAdapter 框架 + WeixinAdapter(★ v3 新增)
### Task 8: AppAdapter interface + registry(TDD)
**Files:**
- Create: `modules/desktop_op/runtime/adapters/adapter.ts`
- Create: `modules/desktop_op/runtime/adapters/registry.ts`
- Create: 测试
- [ ] **Step 1:** 写 `adapter.ts`(spec §5.2 已定):
```ts
export interface AppAdapter {
appId: string;
supportedActions: string[];
findWindow(target: any): Promise<WindowHandle | null>;
preFlightCheck(task: DesktopTask, ctx: AdapterContext): Promise<void>;
buildSteps(task: DesktopTask): Promise<ActionStep[]>;
verifyResult(task: DesktopTask, ctx: AdapterContext): Promise<boolean>;
queueKey(target: any): string;
}
```
- [ ] **Step 2:** 写 `registry.ts`:
```ts
@Provide() @Scope(ScopeEnum.Singleton)
class AdapterRegistry {
private map = new Map<string, AppAdapter>();
register(adapter: AppAdapter): void;
get(appId: string): AppAdapter; // 找不到抛 'app-not-allowed'
listAppIds(): string[];
}
```
- [ ] **Step 3:** 写测试 — register / get / 不存在抛错 / listAppIds
- [ ] **Step 4:** Commit
```
git commit -m "feat(desktop-op): AppAdapter interface + AdapterRegistry"
```
---
### Task 8.5: 扩展 NetaToolRuntimeContext + 注入 bizContext + currentAgent(★ v4 新增,含 D1 修订)
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/tools/runtime_context.ts`
- Modify: `packages/backend/src/modules/netaclaw/runtime/agent.ts`(`AgentRunParams` 加 `runtime?: NetaToolRuntimeContext` 字段,beforeToolCall 注入)
- Modify: `packages/backend/src/modules/netaclaw/service/agent_executor.ts`(透传 runtime 到 agentRunner + injectToolRuntimeContext)
- Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts`(reply agent run 前注入 bizContext + currentAgent)
- Modify: `packages/backend/src/modules/netaclaw/service/subagent.ts`(runPreparedExecution 时继承 parent runtime.bizContext + 替换 currentAgent 为 subagent 自己)
- Modify: `packages/backend/src/modules/netaclaw/subagent/process_runner.ts`(subprocess 模式 IPC envelope 透传 runtime;若 JSON-stringify 失败 throw 'biz-context-not-serializable')
- Create: 测试
- [ ] **Step 1:** 改 `runtime_context.ts`:
```ts
export interface NetaToolRuntimeBizContext {
channelId?: number;
roomName?: string;
// 限制 JSON-safe(primitive / array / plain object)
[k: string]: string | number | boolean | null | undefined | object | any[];
}
export interface NetaToolRuntimeCurrentAgent {
id: number;
name: string;
modelChannelId: number | null;
toolsets: string[];
}
export interface NetaToolRuntimeContext {
sessionCwd?: string | null;
workspaceRoots?: string[];
bizContext?: NetaToolRuntimeBizContext; // ★ v4 新增
currentAgent?: NetaToolRuntimeCurrentAgent; // ★ v4 新增
}
export function injectToolRuntimeContext<T>(
args: T,
runtime: NetaToolRuntimeContext | undefined,
): T & { _netaRuntime?: NetaToolRuntimeContext } {
// 原逻辑 + 加 bizContext / currentAgent 透传
// ★ 校验 JSON.stringify(runtime) 必须成功(防嵌函数/循环引用),否则 throw 'biz-context-not-serializable'
}
export function readToolRuntimeContext(params): NetaToolRuntimeContext | undefined {
// 原逻辑 + 解析 bizContext / currentAgent
}
```
- [ ] **Step 2:** 改 `runtime/agent.ts`:`AgentRunParams` 加 optional `runtime?: NetaToolRuntimeContext` 字段(不破坏现有调用);runAgent 内部把 runtime 透传到 beforeToolCall
- [ ] **Step 3:** 改 `agent_executor.ts:beforeToolCall`(行 290-293):把 runtime 参数直接传给 `injectToolRuntimeContext(args, params.runtime ?? {sessionCwd, workspaceRoots})`
- [ ] **Step 4:** 改 `agent_channel.ts:handleInboundMessage` 在 agent_executor.run 调用前注入:
```ts
const replyAgent = await this.agentService.info(effectiveAgentId);
const bizContext = { channelId: channel.id, roomName: (group as any).roomName ?? scope.chatId };
const currentAgent = { id: replyAgent.id, name: replyAgent.name, modelChannelId: replyAgent.modelConfig?.modelId ? null : null, toolsets: replyAgent.toolsets ?? [] }; // 见 agent.ts entity 字段
// 传给 agent_executor.run({..., runtime: { bizContext, currentAgent }})
```
(注:modelChannelId 字段在 NetaClawAgentEntity 里实际通过 modelConfig / auxiliaryModelChannelId 等映射,实施时按 agent_executor 现有解析逻辑取真实值)
- [ ] **Step 5:** 改 `subagent.ts:runPreparedExecution`,把 ctx 传入的 parent runtime 继承下去,**但替换 currentAgent 为 subagent entity 自己的**:
```ts
const subagentRuntime = {
...ctx.parentRuntime, // parent 的 bizContext / sessionCwd / workspaceRoots 全继承
currentAgent: { id: subagentEntity.id, name: subagentEntity.name, modelChannelId: ..., toolsets: subagentEntity.toolsets ?? [] }, // 替换
};
// in_process 模式:agentRunner({..., runtime: subagentRuntime})
// subprocess 模式:envelope.runtime = subagentRuntime,IPC 序列化
```
`NetaClawSubagentRunSingleContext` 加 optional `parentRuntime?: NetaToolRuntimeContext` 字段;`delegate_task.ts` 在 mode='session-subagent' 分支调用 `ctx.runSingle` 时把当前 parent runtime 透传进去(需扩展 `SessionDelegateToolContext` 或 `runSingle` 签名)
- [ ] **Step 6:** 改 `subagent/process_runner.ts`:`SubagentRunRequest` envelope 加 `runtime?: NetaToolRuntimeContext` 字段;调用前 `JSON.stringify(runtime)` 校验通过再发(失败 throw 'biz-context-not-serializable');worker 端接收后 attach 到 agent_executor 内部的 runtime
- [ ] **Step 7:** 写测试:
- inject/read 往返(含 bizContext 和 currentAgent)
- JSON-unsafe bizContext(嵌函数 / 循环引用)→ throw 'biz-context-not-serializable'
- subagent in_process 模式继承 parent bizContext + 替换 currentAgent
- subagent subprocess 模式 IPC envelope 含 runtime
- [ ] **Step 8:** Commit
```
git commit -m "feat(netaclaw): NetaToolRuntimeContext 加 bizContext + currentAgent,reply agent → subagent → tool 透传"
```
---
### Task 9: WeixinAdapter(MVP 唯一实现,TDD)
**Files:**
- Create: `modules/desktop_op/runtime/adapters/weixin_adapter.ts`
- Create: 测试(mock screenshot / VLM)
- [ ] **Step 1:** 写测试:
- findWindow:mock windowLocator.findByAppName('Weixin',{largest:true}) 返回 handle → adapter 直接透传
- preFlightCheck happy:VLM 返回"当前对话是 文件传输助手"→ pass
- preFlightCheck 失败:VLM 返回 "当前对话是 公众号" → 抛 `precondition-failed`
- buildSteps('send-text', { text: '你好' }):返回 `[clipboard-write, hotkey:ctrl+v, wait, hotkey:enter, wait]` 共 5 步
- verifyResult happy:VLM 看到最新消息含 text → true
- queueKey({conversation:'文件传输助手'}):返回 `'文件传输助手'`
- [ ] **Step 2:** 实现(spec §3.3):
```ts
@Provide() @Scope(ScopeEnum.Singleton)
class WeixinAdapter implements AppAdapter {
appId = 'weixin';
supportedActions = ['send-text'];
@Inject() windowLocator: WindowLocator;
// findWindow / preFlightCheck / buildSteps / verifyResult / queueKey
}
```
preFlightCheck 调 ctx.screenshot.captureWindowByAppName + ctx.vlm.verifyState 问 "当前微信打开的聊天顶部标题是不是 '${task.target.conversation}'?"
verifyResult 调 ctx.vlm.verifyState 问 "右下角最新一条己方消息是否包含 '${text.slice(0,50)}'?"
- [ ] **Step 3:** 在 configuration.ts onReady 调 `adapterRegistry.register(new WeixinAdapter(...))`(或用 Midway @AutoLoad)
- [ ] **Step 4:** Commit
```
git commit -m "feat(desktop-op): WeixinAdapter(MVP 唯一实现,send-text)"
```
---
## Phase C · VLM 客户端 + model_channel 集成
### Task 10: vlm_client.ts(TDD,凭据走 model_channel)
**Files:**
- Create: `runtime/visual_agent/vlm_client.ts`
- Create: 测试(mock fetch + mock modelChannelService)
- [ ] **Step 1:** 写测试:
- mock modelChannelService.findOne(modelChannelId) 返回 { baseUrl, apiKey, modelName, providerType }
- mock fetch 返回 fixture response(从 Phase 0 PoC 录制)
- 断言 request body 含:
- messages: 含 image_url base64 (data:image/png;base64,...)
- model: 等于 modelChannel.modelName
- max_tokens / temperature 合理
- history 只传"上一张 + 最新一张"截图(★ A4 修订)
- [ ] **Step 2:** 实现:
```ts
@Provide()
@Scope(ScopeEnum.Singleton)
export class VlmClient {
@Inject() modelChannelService: NetaClawModelChannelService;
async nextAction(ctx: TaskContext, parser: Parser, screenshot: Buffer, history: HistoryEntry[]): Promise<{ action: ActionStep; tokensUsed: number }>;
async verifyState(ctx: TaskContext, parser: Parser, screenshot: Buffer, question: string): Promise<{ result: boolean; tokensUsed: number }>;
}
```
内部用 `openai` npm 包,baseURL 走 modelChannel.baseUrl,model 字段用 modelChannel.modelName,system prompt 走 parser.buildSystemPrompt。
- [ ] **Step 3:** 跑测试
- [ ] **Step 4:** Commit
```
git commit -m "feat(visual-agent): VlmClient(走 model_channel + history 截断)"
```
---
## Phase D · Runtime + Service + 持久化(★ v3 改名)
### Task 11: DesktopOpRuntime.runTask(TDD,adapter 主导)
**Files:**
- Create: `modules/desktop_op/runtime/runtime.ts`
- Create: 测试
- [ ] **Step 1:** 写测试,覆盖(spec §3.3):
- happy path:safetyGuard 通过 → adapter.findWindow → activate → preFlightCheck → buildSteps 返回 5 步 → 每步 actionExecutor.execute → verifyResult=true → 返回 ok
- safetyGuard.validateAppId 抛 'app-not-allowed' → runTask 透传
- safetyGuard.validateAction 抛 'dangerous-key-blocked' → 中断 + 透传
- adapter.findWindow 返回 null → 抛 'window-not-found'
- adapter.preFlightCheck 抛 → 透传
- AbortSignal 在 Step 2 中触发 → 立即抛 AbortError,不再执行后续
- adapter.verifyResult 返回 false → 抛 'verify-failed'
- [ ] **Step 2:** 实现(spec §3.3 骨架):
```ts
@Provide() @Scope(ScopeEnum.Singleton)
export class DesktopOpRuntime {
@Inject() safetyGuard: SafetyGuard;
@Inject() adapterRegistry: AdapterRegistry;
@Inject() windowLocator: WindowLocator;
@Inject() screenshot: NodeScreenshooter;
@Inject() input: InputController;
@Inject() vlm: VlmClient;
@Inject() parserRegistry: ParserRegistry;
@Inject() actionExecutor: ActionExecutor;
async runTask(task: DesktopTask, abort: AbortSignal): Promise<TaskResult>;
}
```
- [ ] **Step 3:** 跑测试
- [ ] **Step 4:** Commit
```
git commit -m "feat(desktop-op): DesktopOpRuntime(adapter 主导,safety + abort)"
```
---
### Task 12: DesktopOpService 后台 worker + runAndWait + abortByFilter(★ v4 加 runAndWait,TDD)
**Files:**
- Create: `modules/desktop_op/service/desktop_op.ts`
- Create: 测试
- [ ] **Step 1:** 写测试(spec §3.7 + §3.8):
- **runAndWait happy path:** 调用 runAndWait → 内部 enqueue → worker 跑完 → resolve `{ok:true, taskId, modelCalls, steps, durationMs}`
- **runAndWait 失败:** runtime 抛 verify-failed → reject Error('verify-failed')
- **runAndWait timeout:** runtime 不返回(模拟卡死)→ 60s 后 reject Error('task-timeout') + abortByFilter 取消该 task
- **runAndWait 多并发同 target:** 严格串行(后到的等前面跑完才开始)
- enqueue(fire-and-forget)兼容老接口
- enqueue 多 task 同 (app, target) → worker 串行
- enqueue 跨 (app, target) → per-key worker 并行 enqueue,但内部走 DesktopMutex 物理串行(mock 验证)
- queue > 20 → 丢最老 + log queue-overflow + 若该 task 是 runAndWait reject Error('queue-overflow')
- abortByFilter(t => t.appId==='weixin' && t.target.channelId === 5, 'channel-deleted')
→ 清掉该 channel 的 pending task,中断正在跑的 task,并 reject 对应的 runAndWait Promise(reason='channel-deleted')
- DesktopMutex.acquire 被调,且 release 后下一个 task 启动
- [ ] **Step 2:** 实现(spec §3.7 + §3.8 骨架):
```ts
@Provide() @Scope(ScopeEnum.Singleton)
export class DesktopOpService {
@Inject() runtime: DesktopOpRuntime;
@Inject() desktopMutex: DesktopMutex;
@Inject() adapterRegistry: AdapterRegistry;
@InjectEntityModel(DesktopOpActionLogEntity) logRepo: ...;
private queues = new Map<string, DesktopTask[]>();
private workers = new Map<string, Promise<void>>();
private aborters = new Map<string, AbortController>();
/** ★ v4 新增:runAndWait 的 promise resolvers */
private waiters = new Map<string, { resolve: (r: any) => void; reject: (e: Error) => void; timer: NodeJS.Timeout }>();
enqueue(task: DesktopTask): { taskId: string; queuePosition: number };
/** ★ v4 新增:同步等待版,tool execute 入口 */
async runAndWait(task: DesktopTask, timeoutMs = 60000): Promise<{
ok: true; taskId: string; modelCalls: number; steps: number; durationMs: number;
}> {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
this.waiters.delete(task.id);
this.abortByFilter(t => t.id === task.id, 'task-timeout');
reject(new Error('task-timeout'));
}, timeoutMs);
this.waiters.set(task.id, { resolve, reject, timer });
this.enqueue(task);
});
}
abortByFilter(filter, reason): void {
// 现有 + 清理 waiters 并 reject(reason)
}
private async workerLoop(key: string): Promise<void> {
// 现有 + 终态时调 waiters.get(task.id)?.resolve / reject + clearTimeout + delete
}
private queueKey(task: DesktopTask): string;
}
```
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): DesktopOpService(per-key worker + DesktopMutex + abortByFilter)"
```
---
### Task 13: desktop_op_action_log entity + 持久化
**Files:**
- Create: `modules/desktop_op/entity/desktop_op_action_log.ts`
- Modify: `packages/backend/src/entities.ts`
- Modify: `service/desktop_op.ts` 每个 task 终态写 log
- [ ] **Step 1:** 写 entity(spec §4.3 schema)— task_id / app_id / target_json / action_type / final_text / channel_id (微信场景) / model_calls / status / aborted_reason / etc
- [ ] **Step 2:** 在 DesktopOpService 注入 logRepo,workerLoop 每次结束(success / 各错误 token)写 log
- [ ] **Step 3:** 加测试断言失败路径也写 log
- [ ] **Step 4:** Commit
```
git commit -m "feat(desktop-op): DesktopOpActionLog entity + service 落库"
```
---
### Task 13.5: desktop_op_config entity + 默认行加载(★ v3 新增)
**Files:**
- Create: `modules/desktop_op/entity/desktop_op_config.ts`
- Modify: `packages/backend/src/entities.ts`
- Modify: `packages/backend/src/configuration.ts`(onReady 时若表为空插入默认行 + 调 safetyGuard.loadConfig)
- [ ] **Step 1:** 写 entity(spec §4.1):default_model_channel_id / allowed_apps(JSON)/ extra_dangerous_keys / global_per_min / global_per_day / default_watermark
- [ ] **Step 2:** configuration.ts onReady:
```ts
const cfg = await dataSource.getRepository(DesktopOpConfigEntity).findOne({ where: { id: 1 } });
if (!cfg) {
await ...save({ id: 1, allowedApps: ['weixin'], defaultWatermark: 'suffix', globalPerMin: 30, globalPerDay: 1000 });
}
await safetyGuard.loadConfig(cfg ?? defaults);
```
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): DesktopOpConfig entity + 默认行 + safetyGuard 加载"
```
---
## Phase E · 业务接入(★ v4 重写:tool 化 + 删除 helper / replyToGroup / 自动发送块)
### Task 14: weixin_send_text tool 实现(★ v4 替代 v3 helper,TDD)
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/weixin_send_text.ts`
- Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts`(注册 + TOOLSET 常量)
- Create: 测试
- [ ] **Step 1:** 写测试(mock DesktopOpService + channelRepo):
- 入参缺失:text 空 → 抛 'text-empty'
- text.length > 2000 → 抛 'text-too-long'
- channelId 未传且 bizContext 也无 → 抛 'invalid-params: channelId 缺失'
- currentAgent.modelChannelId 未注入 → 抛 'current-agent-model-channel-missing'
- 从 `_netaRuntime.bizContext` 取 channelId,优先于显式 params(测试两条路径都通)
- 从 `_netaRuntime.currentAgent.modelChannelId` 取 modelChannelId
- watermark='suffix' → finalText 加 ' —AI'
- watermark='zero-width' → finalText 前缀 U+200B
- channel.config.weixinReply.enabled=false → 抛 'weixin-reply-not-enabled'
- 调用 `desktopOpService.runAndWait` 入参正确(appId='weixin' / target.channelId/roomName / actionType='send-text' / modelChannelId 来自 desktop agent 的 currentAgent.modelChannelId)
- runAndWait 成功 → tool 返回 textResult 含 "已发送" + taskId
- runAndWait 失败(verify-failed / window-not-found / precondition-failed / task-timeout)→ tool throw 让 agent 看到
- [ ] **Step 2:** 实现(★ v4 改:modelChannelId 从 bizContext.currentAgent 取,不再用工厂函数闭包):
```ts
import { Type } from '@sinclair/typebox';
import { AgentToolWithMeta, textResult } from '../common.js';
import { registerSchema, TOOLSET_WEIXIN_DESKTOP } from '../catalog.js';
import { readToolRuntimeContext } from '../runtime_context.js';
import { randomUUID } from 'node:crypto';
const Params = Type.Object({
roomName: Type.String({ description: '目标群名(必填)' }),
text: Type.String({ description: '要发送的文本(必填,长度 1-2000)' }),
channelId: Type.Optional(Type.Number({ description: 'NetaClaw channel id,优先级低于 runtime bizContext' })),
_netaRuntime: Type.Optional(Type.Any()), // 内部字段,beforeToolCall 注入
});
/**
* 微信发文字 tool。
* 通过 NetaToolRuntimeContext 自动拿 channelId/roomName(bizContext)和当前 agent 的 modelChannelId(currentAgent)
*/
export function createWeixinSendTextTool(deps: {
desktopOpService: any; // DesktopOpService
channelRepo: any;
}): AgentToolWithMeta<typeof Params, unknown> {
return {
name: 'weixin_send_text',
label: '微信发送文字',
description: '在指定微信群里发送一段文字',
parameters: Params,
async execute(_id, params) {
const runtime = readToolRuntimeContext(params as any);
const channelId = runtime?.bizContext?.channelId ?? params.channelId;
const modelChannelId = runtime?.currentAgent?.modelChannelId;
if (!channelId) throw new Error('invalid-params: channelId 缺失(既无 bizContext 也无显式参数)');
if (!modelChannelId) throw new Error('current-agent-model-channel-missing: desktop agent 未配置 modelChannel');
if (!params.text || params.text.length === 0) throw new Error('text-empty');
if (params.text.length > 2000) throw new Error('text-too-long');
const channel = await deps.channelRepo.findOne({ where: { id: channelId } });
const cfg = (channel?.config as any)?.weixinReply;
if (!cfg?.enabled) throw new Error('weixin-reply-not-enabled');
const watermark = cfg.watermark ?? 'suffix';
let finalText = params.text;
if (watermark === 'suffix') finalText = params.text + ' —AI';
else if (watermark === 'zero-width') finalText = '' + params.text;
const result = await deps.desktopOpService.runAndWait({
id: `cid-${channelId}-${randomUUID()}`,
appId: 'weixin',
target: { conversation: params.roomName, channelId, roomName: params.roomName },
actionType: 'send-text',
params: { text: finalText, originalText: params.text },
modelChannelId,
maxSteps: 8,
enqueuedAt: Date.now(),
}, 60000);
return textResult(`已在群 "${params.roomName}" 发送: ${params.text.slice(0, 60)}${params.text.length > 60 ? '...' : ''} (taskId=${result.taskId}, ${result.durationMs}ms)`);
},
};
}
registerSchema({
name: 'weixin_send_text',
toolset: TOOLSET_WEIXIN_DESKTOP,
description: '在指定微信群里发送一段文字',
visibility: 'tool',
capability: 'text',
isCore: false,
canDisable: true,
});
```
- [ ] **Step 3:** 改 `catalog.ts`:
```ts
export const TOOLSET_WEIXIN_DESKTOP = 'weixin_desktop' as const;
// ...
import './builtin/weixin_send_text.js';
```
- [ ] **Step 4:** 注册到 tool_resolver(让 tool 能在 desktop agent 调用时被构造,modelChannelId 来自当前 agent)— 参考 `clarify.ts` 或 `delegate_task.ts` 的注册方式,改 `service/tool_resolver.ts`
- [ ] **Step 5:** 跑测试 + 类型检查
- [ ] **Step 6:** Commit
```
git commit -m "feat(netaclaw): weixin_send_text tool(toolset=weixin_desktop,调 DesktopOpService.runAndWait)"
```
---
### Task 15: 删除 weixin_db.replyToGroup + agent_channel 自动发送块(★ v4 清理)
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/service/weixin_db.ts`
- Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts`
- Modify: 对应测试(去掉 replyToGroup mock,补充新断言)
- [ ] **Step 1:** `weixin_db.ts`:
- **删除** `replyToGroup` 方法(行 166-171)
- 修改类注释(行 32 那行 ` - replyToGroup: 占位 throw NotImplementedError(等待 spec 5.7 实施)` 删掉)
- 保留所有读路径方法(bindChannel / unbindChannel / getRuntime / healthCheck / probeAlive / refreshWhitelist / currentWhitelistSync / ensureWhitelistLoaded)
- [ ] **Step 2:** `agent_channel.ts`:
- **删除** weixin-db 自动发送整个分支(当前代码行 584-608 的 `if (channel.type === 'weixin-db') { ... return; }`)
- 删除时**不要碰** `iLink (weixin ClawBot)` 分支(行 610 起,这是另一条 channel.type 路径)
- 添加注释 `// v4: weixin-db 不再自动发送,reply agent 必须主动 delegate_task 给 desktop agent`
- [ ] **Step 3:** `agent_channel.ts:handleInboundMessage` 在 reply agent run 前注入 bizContext + currentAgent(衔接 Task 8.5):
```ts
const replyAgent = await this.agentService.info(effectiveAgentId);
const runtimeBizContext = {
channelId: channel.id,
roomName: (group as any).roomName || this.extractGroupName(rawMessage) || scope.chatId,
};
const runtimeCurrentAgent = {
id: replyAgent.id,
name: replyAgent.name,
modelChannelId: replyAgent.modelConfig?.modelId ? null : null, // 实际从 agent_executor 现有解析逻辑取
toolsets: replyAgent.toolsets ?? [],
};
// 传给 agent_executor.run({..., runtime: { bizContext: runtimeBizContext, currentAgent: runtimeCurrentAgent }})
```
- [ ] **Step 4:** ★ R3 兜底:**reply agent run 完成后检测是否调过 delegate_task**(防管理员 prompt 配错导致消息黑洞):
```ts
const runResult = await this.agentExecutorService.run({..., runtime});
// weixin-db + enabled=true 时,若 toolExecutions 里没出现过 'delegate_task',log warning
const isWeixinReply = channel.type === 'weixin-db' && channel.config?.weixinReply?.enabled === true;
if (isWeixinReply) {
const calledDelegate = runResult.toolExecutions?.some(t => t.name === 'delegate_task');
if (!calledDelegate) {
this.logger.warn(
'[AgentChannel] WARN reply agent did not call delegate_task (message dropped silently). channelId=%s roomName=%s finalContent="%s..."',
channel.id, runtimeBizContext.roomName, String(runResult.finalContent || '').slice(0, 80),
);
}
}
```
不阻止流程(reply agent 决定不回复是合法行为),只 log 提示管理员。
- [ ] **Step 5:** 修测试:
- 删除原 `weixinDbService.replyToGroup` mock 相关测试
- 新增测试覆盖 "weixin-db channel 收到群消息,reply agent 不被自动调用 replyToGroup"
- 新增 bizContext / currentAgent 注入测试
- 新增 "reply agent 没调 delegate_task → log warning" 测试(mock logger)
- [ ] **Step 6:** 全量跑 `pnpm --filter @neta/backend test`,确保不破坏
- [ ] **Step 7:** Commit
```
git commit -m "refactor(netaclaw): 删除 weixin_db.replyToGroup 占位 + agent_channel 自动发送块 + reply agent 漏 delegate 检测(v4 双 agent)"
```
---
### Task 15.5: agent_channel.update 校验双 agent toolset + 自动配 workerRoutingStrategy(★ v4 新增)
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts`(update 方法)
- Modify: `packages/backend/src/modules/netaclaw/service/agent.ts`(可能需要补一个 update agent.tools.perTool 的辅助方法)
- [ ] **Step 1:** 在 `update(data)` 中,当 `data.type==='weixin-db'` 且 `data.config?.weixinReply?.enabled===true` 时,**串行执行所有校验**(任一失败 throw 阻止保存):
- 取 reply agent(`data.agentId`)`agent.toolsets: string[]`,**必须包含** `'crew'`,否则 throw 'reply-agent-missing-crew-toolset'
- 校验 `data.config.weixinReply.desktopAgentId` 必填且对应 agent 存在(throw 'desktop-agent-not-found')
- 取 desktop agent,**`agent.toolsets` 必须包含 `'weixin_desktop'`**(throw 'desktop-agent-missing-weixin-desktop-toolset')
- 取 desktop agent,**`agent.toolsets` 不能包含 `'crew'`**(throw 'desktop-agent-must-not-have-crew-toolset')
- 校验 reply agent.id !== desktop agent.id(throw 'reply-and-desktop-cannot-be-same')
- [ ] **Step 2:** **校验通过后,自动 patch desktop agent**(若未配置则补齐):
```ts
const tools = desktopAgent.tools ?? {};
const perTool = tools.perTool ?? {};
const wxTool = perTool['weixin_send_text'] ?? {};
let dirty = false;
if (wxTool.allowInSubagent !== true) { wxTool.allowInSubagent = true; dirty = true; }
if (wxTool.workerRoutingStrategy !== 'force-main-process-proxy') {
wxTool.workerRoutingStrategy = 'force-main-process-proxy';
dirty = true;
}
if (dirty) {
perTool['weixin_send_text'] = wxTool;
await this.agentService.update({ id: desktopAgent.id, tools: { ...tools, perTool } });
this.logger.info('[agent-channel] auto-patched desktop agent %s tools.perTool.weixin_send_text', desktopAgent.id);
}
```
这保证:
- 即使管理员忘记勾选 `allowInSubagent`,系统也能让 subagent 调到 weixin_send_text
- 即使 subagent 后续切到 subprocess 模式,tool 也会 proxy 回 main process,DesktopMutex 单实例继续有效
- [ ] **Step 3:** **推荐配置(MVP 不强制)**:若 `replyAgent.subagentConfig.allowedPresetAgentIds` 为空,log info 提示 "建议给 reply agent 配 allowedPresetAgentIds=[desktopAgentId] 限定 delegate 目标"。不自动 patch,避免覆盖用户意图。
- [ ] **Step 4:** 写测试覆盖所有失败路径 + 自动 patch 路径(mock agentService.update 断言入参)
- [ ] **Step 5:** Commit
```
git commit -m "feat(netaclaw): channel.update 校验双 agent toolset + 自动配 weixin_send_text 的 routing 策略"
```
---
### Task 16: agent_channel.delete cascade abortByFilter
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts`
- [ ] **Step 1:** 加 `@Inject() desktopOpService: DesktopOpService;`
- [ ] **Step 2:** 在 `delete(ids)` 循环里加:
```ts
for (const id of ids) {
this.stopRunner(id);
await this.groupService.cascadeDeleteByChannel(id);
this.weixinDbService.unbindChannel(id);
await this.archiveSyncService.deleteChannelArchive(id);
this.desktopOpService.abortByFilter(
t => t.appId === 'weixin' && t.target?.channelId === id,
'channel-deleted',
);
}
```
- [ ] **Step 3:** 修对应 mock 测试,加 `desktopOpService` mock
- [ ] **Step 4:** Commit
```
git commit -m "feat(netaclaw): channel.delete cascade abort desktop_op"
```
---
### Task 17: weixinReply.enabled 关闭时 cascade abort
**Files:**
- Modify: `modules/netaclaw/service/agent_channel.ts` 的 `update` 方法
- [ ] **Step 1:** 在 `update(data)` 中,检测 channel.config.weixinReply.enabled **由 true 变 false** → 调:
```ts
this.desktopOpService.abortByFilter(
t => t.appId === 'weixin' && t.target?.channelId === existing.id,
'weixin-reply-disabled',
);
```
- [ ] **Step 2:** Commit
```
git commit -m "feat(netaclaw): channel.config.weixinReply.enabled 变 false 时 cascade abort"
```
---
## Phase F · 审计 + 配置 controller
### Task 18: desktop_op_action_log controller
**Files:**
- Create: `modules/desktop_op/controller/admin/desktop_op_action_log.ts`
- [ ] **Step 1:** `@CoolController` + `POST /list` + `GET /info`,过滤字段: appId / channelId / status / 时间范围
- [ ] **Step 2:** Commit
```
git commit -m "feat(desktop-op): desktop_op_action_log admin API"
```
---
### Task 19: desktop_op_config controller(MVP 仅 get/update)
**Files:**
- Create: `modules/desktop_op/controller/admin/desktop_op_config.ts`
- [ ] **Step 1:** `@Get('/info')` 取 id=1 的单行 + `@Post('/update')` 更新
- [ ] **Step 2:** update 后 reload SafetyGuard config
- [ ] **Step 3:** Commit
```
git commit -m "feat(desktop-op): desktop_op_config admin API + SafetyGuard reload"
```
---
## Phase G · 前端
### Task 20: channel-edit.vue 加微信自动回复区块(★ v4 双 agent 下拉)
**Files:**
- Modify: `packages/frontend/src/modules/agent/views/channel-edit.vue`(或对应 weixin-db 编辑组件)
- [ ] **Step 1:** type=weixin-db 时新增:
- 自动回复:radio (`disabled` / `enabled`),默认 disabled
- **对话 Agent**(★ v4):下拉绑定 `channel.agentId`(沿用现有字段,这是表单顶部的字段,无需新增控件,但要加 hint "必须启用 crew toolset")
- **桌面操作 Agent**(★ v4 新):下拉绑定 `channel.config.weixinReply.desktopAgentId`,数据源 `service.netaclaw.agent.list({})` 前端过滤 toolset 含 `weixin_desktop`
- 校验:enabled=true 时 desktopAgentId 必填 + 不能等于 channel.agentId(前端提示)
- 小号安全模式:开关(默认开)
- 每天上限:数字(默认 100)
- 每群每分钟:数字(默认 3)
- 消息水印:radio (`none` / `suffix` / `zero-width`),默认 suffix
- 风险提示文案 + ★ "桌面操作 Agent 的模型 / prompt / toolset 请在 Agent 管理页配置"
- ❌ **不再有 "使用模型" 下拉**(v4 移除 modelChannelId 字段)
- [ ] **Step 2:** 提交时塞入 `channel.config.weixinReply = { enabled, desktopAgentId, dailyLimit, perGroupPerMinute, safeMode, watermark }`
- [ ] **Step 3:** `pnpm --filter @neta/frontend type-check` 确认我改的文件无新 ts error
- [ ] **Step 4:** Commit
```
git commit -m "feat(agent-fe): channel-edit 加微信自动回复区块(双 agent 下拉,channel.config.weixinReply.desktopAgentId)"
```
---
### Task 20.5: desktop_op_config 设置页(可选,首版用默认值即可)
后续 spec — `Layer 2`。
---
## Phase H · E2E + 老 spec 收尾
### Task 21: E2E checklist + 验证报告
**Files:**
- Create: `docs/superpowers/followups/2026-05-14-desktop-op-e2e.md`
**前置:**
- Windows + 微信 4.x 登录 + 测试群 + 测试小号(已养 ≥ 7 天)
- backend + frontend 启动
- ★ v4 配置:
- 管理后台创建 reply agent A(toolset=`base`+`interaction`+`crew`,modelChannel 选普通 LLM)
- 管理后台创建 desktop agent B(toolset=`weixin_desktop`+`interaction`,modelChannel 选 multimodal 火山 Seed-2.0-pro,prompt 用默认模板)
- 编辑 weixin-db channel:enabled=true,对话 Agent=A,桌面操作 Agent=B,watermark=suffix
- `desktop_op_config` 表已有默认行(`allowed_apps:['weixin']`)
- Phase 0.5 IPC PoC 已完成且结论已应用到 Task 5 实现
**Checklist:**
- [ ] **E2E-1:** Phase 0 PoC ✅(已通过 100% 1 次)
- [ ] **E2E-1.5:** Phase 0.5 Subagent IPC PoC 已完成,结论记录在 followup
- [ ] **E2E-2:** 跑 N=20 Task 0.2 收集 fixtures + 验证 ≥ 80% 成功率(若未做)
- [ ] **E2E-3:** 配置:reply agent A(crew toolset)+ desktop agent B(weixin_desktop toolset,不含 crew)+ channel 绑两个 agent + enabled=true + watermark=suffix
- [ ] **E2E-3.5:** ★ 验证后端 channel.update 校验:
- 给 reply agent 去掉 crew toolset 后保存 → 报错 "reply-agent-missing-crew-toolset"
- 给 desktop agent 加上 crew toolset 后保存 → 报错 "desktop-agent-invalid-toolset"
- reply agent 与 desktop agent 选同一个 → 前端报错
- [ ] **E2E-4:** ★ **核心双 agent 链路**:在测试群发问题(如"今天天气如何")→
- reply agent 收到 db 触发的 onInbound 后 ReAct
- reply agent 调 `delegate_task({mode:'preset', agentId:B.id, goal:'在群 X 发送: 阴 12-18 度'})`
- desktop agent 启 subagent process,ReAct 后调 `weixin_send_text({roomName:'X', text:'阴 12-18 度'})`
- tool 调用 DesktopOpService.runAndWait,desktop op 完成桌面键鼠 + VLM 验证
- 5-40s 内群里收到 "阴 12-18 度 —AI"
- subagent_session 表新增一条 desktop agent 会话 + desktop_op_action_log 新增一条
- [ ] **E2E-4.1:** ★ reply agent 决定不回复:发"[请忽略]xxx",reply agent prompt 教它跳过 → 群里**无任何回复**,desktop_op_action_log **无新增**(因为 tool 没被调)
- [ ] **E2E-4.2:** ★ bizContext 透传验证:`weixin_send_text` tool 内部 log 出 `channelId` 来源是 `bizContext`(而非 LLM 在 params 显式传)
- [ ] **E2E-5:** 重复 5 次幂等(都成功)
- [ ] **E2E-6:** 故意把微信最小化 → desktop_op activate 自动恢复并发送
- [ ] **E2E-7:** 每分钟连发 5 次 → 第 4/5 次 rate-limited,desktop_op_action_log 显示 status=rate-limited
- [ ] **E2E-8:** 在管理后台把 channel.config.weixinReply.enabled 切 false → 队列中的 pending task 显示 aborted_reason=weixin-reply-disabled(via Task 17);新消息进来,reply agent 仍能 run 但 `weixin_send_text` tool 抛 'weixin-reply-not-enabled'
- [ ] **E2E-9:** 删除 channel → 该 channel 的 pending task 全 aborted_reason=channel-deleted
- [ ] **E2E-10:** 关闭微信进程 → desktop agent 收到 tool 抛 'window-not-found',按 prompt 决定不重试,reply agent 拿到失败结果
- [ ] **E2E-11:** 查 desktop_op_action_log 表:每条调用 1 row,final_text 全文落库,target_json 含 channelId/roomName
- [ ] **E2E-12:** 模拟模型异常(临时改 desktop agent 的 modelChannel.baseUrl 错误 URL)→ tool 抛 model-failed,desktop agent 重试 1 次,reply agent 收到失败
- [ ] **E2E-13:** archive sync 与 desktop_op 同时触发 → archive sync 走自己 channelLocks(不阻),desktop_op 走 DesktopMutex 串行 — 两者互不抢前台
- [ ] **E2E-14:** SafetyGuard:reply agent prompt 故意诱导发"删除文件"操作,desktop agent 调 weixin_send_text 后,safety guard 拦截 hotkey 'delete' → status=dangerous-action-blocked
- [ ] **E2E-15:** SafetyGuard:故意写一个 fake tool 发 task appId='excel' → status=app-not-allowed
- [ ] **E2E-16:** Loop 防护:**给 desktop agent 错误地配 `crew` toolset 试图保存** → 后端校验拦下;如果绕过校验直接改 DB,desktop agent 调 delegate_task → 应该被 tool 层面或 subagent 层面拦截(深度限制)
- [ ] **E2E-17:** 用户在用电脑(鼠标移动到非微信窗口)→ 让位机制生效(后续可加),MVP 至少不 crash
- [ ] **E2E-18:** ★ **MVP 单对话假设验证**(R1 风险):
- 配置 channel 监听 2 个群 A 和 B
- 让微信停留在群 A 对话页面
- 同时往 A 和 B 发消息
- 期望:A 收到回复 + B 收到 'precondition-failed' 错误,desktop_op_action_log 显示 B 任务 status=precondition-failed
- 若 B 错误地收到了回复(说明跑去群 B 发了) → 是个 bug,需要 fix
- [ ] **E2E-19:** ★ **reply agent 漏 delegate_task 检测**(R3 风险):
- 配置 reply agent 的 prompt **故意不教它调 delegate_task**
- 在群里发问题
- 期望:群里**无回复**,但 backend log 出现 `WARN reply agent did not call delegate_task (message dropped silently)` 行
- [ ] **E2E-20:** ★ **bizContext JSON-safe 校验**:
- 写一个临时测试 inject 一个含 `function` 字段的 bizContext
- 期望:抛 'biz-context-not-serializable',不影响正常流程
- [ ] **E2E-21:** ★ **desktop agent 自动 patch workerRoutingStrategy**:
- 创建 desktop agent **不配置** `tools.perTool['weixin_send_text']`
- 在 channel 编辑页保存(enabled=true)
- 重新 list desktop agent,断言 `tools.perTool['weixin_send_text'].allowInSubagent === true` 且 `workerRoutingStrategy === 'force-main-process-proxy'`
- [ ] **Step 1:** 逐条手工跑,关键场景留 screenshot / log
- [ ] **Step 2:** 写报告,内容:
- 环境(微信版本、模型版本、Node 版本、backend 版本)
- Checklist 结果
- 已知问题 / followup
- 平均单条耗时 + token / 成本(校验 spec §7.3 估算)
- 成本对照:Seed 2.0 Pro 实测 / 估算
- [ ] **Step 3:** Commit
```
git commit -m "docs(desktop-op): E2E 验证报告"
```
---
### Task 22: 老 weixin-uia spec 标 OBSOLETE
**Files:**
- Modify: `docs/superpowers/specs/2026-05-09-wechat-uia-channel-design.md`
- [ ] **Step 1:** 文件顶部加(在 frontmatter 之后):
```markdown
> **⚠️ OBSOLETE 2026-05-14**:UIA 路线在微信 4.1.9.54 经 PoC(`tools/uia_probe/probe.ps1`)验证彻底失效(Qt 自绘 + `MMUIRenderSubWindowHW` 硬件加速渲染层 → UIA 树只有 3 节点 0 交互控件;讲述人 / 注册表 AccessibilityTemp / `QT_ACCESSIBILITY=1` 环境变量 / `StructureChangedEventHandler` 伪客户端全部无效)。
> 新方案见 `2026-05-14-neta-desktop-op-design.md` v3(通用桌面 GUI Agent,WeixinAdapter 是第一个 application adapter)。
> 本文件保留作历史参考。
```
- [ ] **Step 2:** Commit
```
git commit -m "docs(spec): weixin-uia spec 标 OBSOLETE(v3 desktop_op 取代)"
```
---
## 自检 (Self-Review)
### 0. v4 双 Agent 架构覆盖(★ 新增)
| Spec v4 章节 | 覆盖 Task |
|---|---|
| §0 v4 主要变更 H1-H8 | 整个 v4 plan |
| §1.4 双 Agent 模型(职责分工 / 防 loop) | Task 15.5(toolset 校验) + Task 20(前端两个下拉) |
| §2.1 微信场景接入(新链路图) | Task 8.5 + Task 14 + Task 15 |
| §3.2 模块分层(weixin_send_text tool / 删除 helper) | Task 14 + 不创建 helper |
| §3.3 ReAct 拓扑(adapter 主导) | Task 11(沿用) |
| §3.7 DesktopOpService.runAndWait | Task 12(★ v4 修订) |
| §4.1 desktop_op_config(★ 移除 default_model_channel_id) | Task 13.5(修订 entity 字段) |
| §4.2 channel.config.weixinReply(加 desktopAgentId,删 modelChannelId) | Task 20(前端) + Task 15.5(校验) |
| §4.2.1 reply / desktop agent toolset 校验 | Task 15.5 |
| §4.2.2 bizContext 透传机制 | Task 8.5(扩 runtime_context)+ Task 15(agent_channel 注入)+ Task 14(tool 读取) |
| §5.3 runAndWait 接口 | Task 12 |
| §5.4 weixin_desktop toolset + weixin_send_text | Task 14 |
| §5.5 已删除项(WeixinReplyHelper / replyToGroup / 自动发送块) | Task 15 |
| §6.1 前端双 agent 下拉 | Task 20 |
| §7.1 model 走 desktop agent.modelChannelId | Task 14(tool execute 注入 modelChannelId) |
| §7.2.2 desktop agent 默认 prompt 模板 | Task 14(README / 文档建议)+ E2E-3 配置 |
| §7.2.3 reply agent prompt 增量提示 | E2E-3 配置 |
| §8.0 Phase 0.5 Subagent IPC PoC | Phase 0.5 Task 0.5.1 |
| §8.4 E2E(双 agent 链路验证) | Phase H Task 21 全部 checklist |
### 1. Spec v3 覆盖
| Spec 章节 | 覆盖 Task |
|---|---|
| §0 v3 变更(G1-G9)| 整个 plan |
| §1 背景 + UIA 失败 | Task 22 |
| §2.1 senderQueue 解耦 + fire-and-forget | Task 12 (DesktopOpService enqueue) + Task 15 (replyToGroup) |
| §2.2 用户感知 + watermark + model 下拉 | Task 14 / Task 20 |
| §2.3 PoC 暴露的导航问题 | Task 9 (WeixinAdapter preFlightCheck 要求手动定位) |
| §3.1 进程模型 | Task 1 (DPI) + 整体内嵌 |
| §3.2 模块分层 desktop_op/ | Phase A-F 全在 desktop_op,Phase E 在 netaclaw |
| §3.3 ReAct adapter 主导 | Task 11 |
| §3.4 中文输入 clip.exe | Task 6c |
| §3.5 全局 DesktopMutex | Task 5 |
| §3.6 SafetyGuard | Task 5.5 |
| §3.7 后台 worker + per-app queue | Task 12 |
| §4.1 desktop_op_config | Task 13.5 |
| §4.2 channel.config.weixinReply | Task 20 (前端) + Task 14 (服务 read) |
| §4.3 desktop_op_action_log | Task 13 |
| §5.1 文件清单 | Task 1-13 |
| §5.2 接口 DesktopTask / ActionStep / AdapterContext | Task 2 / Task 8 / Task 9 / Task 11 |
| §5.3 AppAdapter interface | Task 8 |
| §5.4 DesktopOpService | Task 12 |
| §5.5 WeixinReplyHelper | Task 14 |
| §6 前端 | Task 20 |
| §7.1 模型 model_channel | Task 10 (vlm_client) + Task 14 / Task 20 |
| §7.2 Parser + Adapter prompt | Task 3 / Task 9 |
| §7.3 成本估算 | Phase 0 PoC 校验 + Task 21 E2E 校验 |
| §7.4 deps(不含 clipboardy)| Task 1 |
| §7.5 DPI Aware | Task 1 |
| §8.1 Phase 0 PoC ✅ | Task 0.1 / 0.2 |
| §8.2 单元测试 + fixtures | Task 3 + 其他各 task |
| §8.3 CI 政策 | plan 顶部声明 |
| §8.4 E2E | Task 21 |
### 2. v3 review 9 项覆盖
| # | 问题 | 覆盖 Task |
|---|---|---|
| G1 | 模块迁出 netaclaw | 整个 Phase A-D 在 modules/desktop_op/ |
| G2 | TaskContext 通用化 schema | Task 2 |
| G3 | AppAdapter 注册式 | Task 8 + Task 9 |
| G4 | agent_executor tool 注册 | (Layer 2,留口) |
| G5 | 全局 desktop_op_config | Task 13.5 + Task 19 |
| G6 | 改名 desktop_op | 整个 plan |
| G7 | SafetyGuard | Task 5.5 + Task 11 (runtime 校验) + Task 21 E2E-14/15 |
| G8 | 全局 DesktopMutex | Task 5 + Task 12 |
| G9 | admin HTTP `/run-task` | (Layer 2,留口) |
### 3. v2 review 14 项覆盖(全部沿用)
详见 v2 plan 末尾自检,这里不重复(Phase 0 PoC / fire-and-forget / AbortSignal / fixtures / CI / 养号 / final_text / watermark / etc 全部在 v3 沿用 + 强化)。
### 4. Placeholder 扫描
- 无 TBD
- 每 Step 都有具体代码 / 命令 / 文件路径
### 5. 类型一致性
- `DesktopTask` / `ActionStep` / `TaskResult` / `AdapterContext` 在 Task 2 / 8 / 9 / 11 / 12 / 13 / 14 一致
- 错误 token 一致: window-not-found / precondition-failed / app-not-allowed / dangerous-key-blocked / dangerous-action-blocked / model-failed / model-hallucinated / verify-failed / queue-overflow / aborted / weixin-reply-not-enabled / model-channel-not-configured / unsupported-platform / channel-not-found / channel-not-bound
- `modelChannelId` 在 task / log entity / channel.config.weixinReply / desktop_op_config 一致
### 6. 跨 Phase 衔接
- Phase 0 PoC ✅ → 输出 raw responses 给 Phase A Task 3 fixtures 用
- Phase A 工具 → Phase B Adapter 依赖 + Phase C VLM 依赖 + Phase D Runtime/Service 依赖
- Phase D runtime/service → Phase E 接入 (helper + replyToGroup + cascade)
- Phase F 审计 → Phase H E2E-11 校验
- Phase G 前端 → Phase H E2E-3/4 入口
### 7. DEV 可行性
- Phase 0 ✅ 已在 Windows 跑通
- Phase A-G 全部可在 Linux/Mac 跑单测(原生模块 platform=win32 才走;原生工具 task 6a/6b/6c 不写单测但有 e2e 兜底)
- Phase H E2E 必须 Windows + 微信登录 + 测试群 + 测试小号 + 火山 API key
### 8. 时间线估算(参考,假设 1 个工程师全职)
| Phase | 估时 | 备注 |
|---|---|---|
| Phase 0 PoC | ✅ 已完成 | |
| Phase A(Task 1-7) | 4-6 天 | Task 5/6a/6b/6c 涉及原生模块,新手卡风险 |
| Phase B(Task 8-9) | 1-2 天 | AppAdapter 接口 + WeixinAdapter |
| Phase C(Task 10) | 1-2 天 | vlm_client |
| Phase D(Task 11-13.5) | 2-3 天 | runtime + service + log entity + config entity |
| Phase E(Task 14-17) | 1.5 天 | 微信 helper + replyToGroup + 2 cascade |
| Phase F(Task 18-19) | 0.5 天 | 标准 Cool CRUD |
| Phase G(Task 20) | 1 天 | 前端 |
| Phase H(Task 21-22) | 1-2 天 | E2E 手工跑 + obsolete |
| **总计** | **12-17 天** | |
加上测试小号养号(7 天并行)+ buffer ≈ **3 周交付 MVP**。
---
## Execution Handoff
Plan 完整保存在 `docs/superpowers/plans/2026-05-14-neta-desktop-op.md` v3。
**立项当天并行启动:**
1. ✅ Phase 0 PoC 已通过(2026-05-14 实跑,100% 1 次)
2. 运营启动测试小号养号(7 天)— 立即
3. 工程师可立即开 Phase A Task 1(装 deps + DPI)
**进入 Phase A 前的最后 sanity check:**
- Phase 0 PoC raw 报告里至少 20 条 VLM 输出已 commit(若没有,先做 Task 0.2)— Phase A Task 3 fixtures 需要它
---
## ★ 与 weixin-archive sync 的边界(再强调)
- **不动**:`weixin_archive_sync.ts`、`runtime/weixin_db/*` 所有监听/解密/WAL watcher 链路
- **不合并锁**:archive sync 仍用自己内部的 `channelLocks` Map(纯读 SQLite 操作,不与桌面键鼠抢)
- **唯一交集**:`agent_channel.delete(ids)` 现在同时调 `archiveSyncService.deleteChannelArchive(id)`(已有,不动)+ `desktopOpService.abortByFilter(...)`(Task 16 新加)
- archive sync ↔ desktop_op **物理隔离**:archive sync 在后台读文件,desktop_op 在前台动键鼠,互相不感知,互不影响。