# Neta Desktop Op 实施计划 v4 > For agentic workers: REQUIRED SUB-SKILL — superpowers:subagent-driven-development(recommended)or superpowers:executing-plans **Status:** Draft **v4** 2026-05-14(双 Agent 架构 + 桌面操作 Tool 化 + 删除 v3 helper / replyToGroup) **Goal:** 双 Agent 桌面 GUI 自动化端到端落地。`channel.agentId`(reply agent)通过 `delegate_task` 委托 `channel.config.weixinReply.desktopAgentId`(desktop agent,以 NetaClaw subagent 形式执行);desktop agent 通过 `weixin_send_text` tool(toolset='weixin_desktop')调 `DesktopOpService.runAndWait` 同步完成桌面键鼠 + VLM 验证。MVP 实现 WeixinAdapter + weixin_send_text;Layer 2 加 Excel/Browser Adapter + 新 toolset 零拆架构。 **Architecture:** Node backend 进程内嵌。`modules/desktop_op/` 通用 Runtime + AppAdapter 注册式 + 全局 DesktopMutex + SafetyGuard。**不引入 WeixinReplyHelper**(v3 已 plan 但 v4 决定删除多余抽象,tool 直接调 service)。Model 走 desktop agent 自己的 `agent.modelChannelId`(因为 desktop agent 是普通 NetaClaw Agent)。 **Tech Stack:** Midway.js 3.20 + TypeORM(MySQL,读 model_channel + 写 desktop_op_action_log + 全局 desktop_op_config)+ `node-screenshots` + `@nut-tree-fork/nut-js` + `koffi` + `clip.exe`(child_process,**不用 clipboardy v5**)+ `openai` npm(OpenAI 兼容)。 **Spec:** `docs/superpowers/specs/2026-05-14-neta-desktop-op-design.md` v4 **v3 → v4 主要变更(8 项,来自双 Agent 架构决议,见 spec §0 v4 row):** - ★ **H1 频道绑 2 个 agent**:`channel.agentId`(reply) + `channel.config.weixinReply.desktopAgentId`(desktop,★ 新) - ★ **H2 桌面操作以 Tool 形式暴露**:新 toolset='weixin_desktop',tool `weixin_send_text` 注册到 `modules/netaclaw/tools/builtin/` - ★ **H3 删除 `weixin_db.replyToGroup` 整方法**(占位)+ **删除 `agent_channel.ts:585-608` 自动发送块**;reply agent 必须主动 `delegate_task` 才能发出消息 - ★ **H4 Tool 同步等待**:新增 `DesktopOpService.runAndWait(task, 60s)` 接口(替代 v3 的 enqueue fire-and-forget 链路;enqueue 仍保留供 Layer 2 后台任务) - ★ **H5 不引入 `WeixinReplyHelper`**:v3 Plan Task 14 取消,tool execute 直接组装 DesktopTask 调 service - ★ **H6 Desktop agent 必须由管理员显式创建**:toolset=`weixin_desktop` + `interaction`,modelChannel 选 multimodal,prompt 用默认模板 - ★ **H7 防 Loop**:desktop agent toolset **不含 `crew`**(后端校验),reply agent toolset **必须含 `crew`**(`delegate_task` 注册在 `crew`) - ★ **H8 移除 `channel.config.weixinReply.modelChannelId`**:VLM 模型从 desktop agent 自己的 `agent.modelChannelId` 取 - ★ **bizContext 透传**:扩展 `NetaToolRuntimeContext` 加 `bizContext.channelId/roomName`,`agent_channel` 注入 → subagent 继承 → tool 自动可读 - ★ **Phase 0.5 新增**:Subagent IPC PoC(验证 tool 在哪个 process 跑,决定 DesktopMutex 是否需要跨进程) **沿用 v3 决议(不变):** - ✅ Phase 0 PoC 已通过(100% 1 次,门禁解除) - 模块在 `modules/desktop_op/`(顶层) - Task schema `DesktopTask{appId, target, actionType, params}` - AppAdapter 注册式,MVP 唯一 WeixinAdapter - 全局 DesktopMutex(待 Phase 0.5 PoC 验证 IPC 后定单进程 vs 跨进程) - SafetyGuard(白名单 + 黑名单) - desktop_op_config 全局表(★ 移除 default_model_channel_id 字段) - clip.exe 代替 clipboardy - 截图每次 enumerate - node-screenshots appName='Weixin' 取最大窗口 - 默认 Parser:JsonActionParser **关键约束:** - 每 Task 一个 commit(★ v4 调整:用户明确 git 不需要提交,subagent 实施时跳过 commit 步骤) - TDD:先测试再实现 - 单元测试可在 Linux/Mac 跑(原生模块在 platform=win32 才走真实路径) - ★ **weixin-archive 监听链路 + weixin_db 读路径(bindChannel/WalWatcher/IncrementalReader/health)完全不动**(用户明确) - ★ **weixin_db.replyToGroup 整方法删除**;**agent_channel.ts:585-608 自动发送块删除**(v4 新增) - ★ reply agent 主动 `delegate_task` 才能发消息(系统不自动发) - model 走 desktop agent 自己的 `agent.modelChannelId`,不硬编码,不再走 channel.config 配 - 通用化但 MVP 只实现 WeixinAdapter + `weixin_send_text` tool,其他 Adapter / tool 留 Layer 2 - **截图必须每次重新 enumerate**(PoC 暴露 node-screenshots 缓存) - **找微信窗口用 node-screenshots `appName==='Weixin'` 取最大**(PoC 暴露 FindWindow 找到子窗口) - ★ **Phase 0.5 PoC(subagent IPC 验证)是 Phase A Task 5 的前置门禁** **前置依赖:** - weixin-archive sync 已合并(读路径已有) - `netaclaw_model_channel` 已有火山引擎 multimodal 配置(id=2 已就绪,Phase 0 验过) - **测试小号已养号 ≥ 7 天 + 测试群 ≥ 5 人**(立项当天启动并行 timeline) - 项目能跑 `pnpm --filter @neta/backend test` --- ## 文件结构 ### 新增 #### `modules/desktop_op/`(通用桌面 Agent,与 netaclaw 平级) | 文件 | 责任 | |---|---| | `runtime/types.ts` | DesktopTask / ActionStep / TaskResult / AdapterContext | | `runtime/dpi.ts` | DPI Aware bootstrap | | `runtime/screenshot.ts` | 截屏(★ 每次 enumerate,不缓存)| | `runtime/window_locator.ts` | 通用窗口定位(用 node-screenshots `appName` + bounds,**不依赖 FindWindow**)| | `runtime/input.ts` | 键鼠 nut.js + ★ clip.exe 写中文剪贴板 | | `runtime/desktop_mutex.ts` | ★ 全局键鼠锁(替代 v2 的 WeixinChannelMutex)| | `runtime/safety_guard.ts` | ★ 应用白名单 + 危险按键/动作硬黑名单 | | `runtime/rate_limiter.ts` | per-app / per-target / daily | | `runtime/parser/parser.ts` | Parser interface | | `runtime/parser/json_action_parser.ts` | ★ MVP 默认(Seed 2.0 Pro 输出 JSON)| | `runtime/parser/registry.ts` | 按 modelChannel.providerType 选 | | `runtime/adapters/adapter.ts` | ★ AppAdapter interface | | `runtime/adapters/weixin_adapter.ts` | ★ MVP 唯一实现 | | `runtime/adapters/registry.ts` | ★ 按 task.appId 选 | | `runtime/vlm_client.ts` | OpenAI 兼容多模态(走 model_channel) | | `runtime/action_executor.ts` | 派发 ActionStep | | `runtime/runtime.ts` | DesktopOpRuntime.runTask(adapter 主导) | | `service/desktop_op.ts` | DesktopOpService: enqueue / per-app worker / abortByFilter | | `entity/desktop_op_action_log.ts` | 审计 entity(通用 schema,微信场景填 channel_id/room_name) | | `entity/desktop_op_config.ts` | ★ 全局配置 entity(单行) | | `controller/admin/desktop_op_action_log.ts` | /list /info | | `controller/admin/desktop_op_config.ts` | get / update | | 对应测试 + fixtures | 见各 Task | #### `modules/netaclaw/`(★ v4 改:Tool 替代 helper) | 文件 | 责任 | |---|---| | `tools/builtin/weixin_send_text.ts` | ★ v4 新增 — toolset='weixin_desktop' 的 tool,execute 直接组装 DesktopTask 调 `DesktopOpService.runAndWait` | | `tools/runtime_context.ts` | ★ v4 修改 — 扩展 NetaToolRuntimeContext 加 `bizContext` 字段(channelId/roomName) | | `tools/catalog.ts` | ★ v4 修改 — `import './builtin/weixin_send_text.js'` + 加 `TOOLSET_WEIXIN_DESKTOP` 常量 | ❌ **v3 plan 的 `service/weixin_reply_helper.ts` 不创建**(v4 决定:tool 直接调 service,无中间 helper) #### `tools/` 和 PoC 产出物 | 文件 | 状态 | |---|---| | `tools/visual_agent_probe/run-once.ts` | ✅ 已存在(Phase 0 PoC,通过)| | `tools/visual_agent_probe/README.md` | ✅ 已存在 | | `tools/visual_agent_probe/check-wechat.ps1` | ✅ 已存在 | | `tools/visual_agent_probe/debug/*.png` | Phase 0 截图,不入 git(.gitignore)| | `docs/superpowers/followups/2026-05-14-visual-agent-poc-raw.json` | ✅ 已存在,Phase 0 raw 报告 | ### 修改 | 文件 | 改动 | |---|---| | `packages/backend/package.json` | 加 deps(node-screenshots / nut-tree-fork/nut-js / koffi,**不加 clipboardy**)— ★ 检查已存在 | | `packages/backend/src/configuration.ts` | onReady 调 `ensureDpiAware()` + 加载 SafetyGuard config | | `packages/backend/src/modules/netaclaw/service/weixin_db.ts` | ★ **删除 `replyToGroup` 整方法**(行 166-171)+ 修改类注释删去 "5.7 占位" 那行 | | `packages/backend/src/modules/netaclaw/service/agent_channel.ts` | ★ **删除 weixin-db 自动发送块**(行 585-608 整段)+ delete(ids) cascade 调 `desktopOpService.abortByFilter(t => t.appId==='weixin' && t.target.channelId === id, 'channel-deleted')` + update 时校验双 agent toolset + 自动配 weixin_send_text 的 workerRoutingStrategy + 注入 bizContext/currentAgent + reply agent 漏 delegate_task 检测告警 | | `packages/backend/src/modules/netaclaw/service/subagent.ts` | ★ subagent runPreparedExecution 时继承 parent runtime.bizContext + 替换 currentAgent 为 subagent 自己;`NetaClawSubagentRunSingleContext` 加 optional `parentRuntime` 字段;in_process 模式直接透传给 agentRunner | | `packages/backend/src/modules/netaclaw/runtime/agent.ts` | ★ `AgentRunParams` 加 optional `runtime?: NetaToolRuntimeContext`,内部透传到 beforeToolCall | | `packages/backend/src/modules/netaclaw/service/agent_executor.ts` | ★ `beforeToolCall` 用 params.runtime(若有)调 `injectToolRuntimeContext`(替代硬编码 {sessionCwd, workspaceRoots}) | | `packages/backend/src/modules/netaclaw/subagent/process_runner.ts` | ★ subprocess 模式:`SubagentRunRequest` envelope 加 `runtime?` 字段,发送前 `JSON.stringify(runtime)` 校验(失败 throw 'biz-context-not-serializable');worker 端 attach 到内部 agent_executor | | `packages/backend/src/modules/netaclaw/tools/builtin/delegate_task.ts` | ★ session-subagent 分支调 `ctx.runSingle` 时透传当前 parent 的 runtime(SessionDelegateToolContext 需扩展 currentRuntime 字段) | | `packages/backend/src/modules/netaclaw/service/tool_resolver.ts` | ★ `SessionDelegateToolContext` 加 `currentRuntime?: NetaToolRuntimeContext`,`createSessionDelegateToolContext` 在构造时填充 | | `packages/backend/src/modules/netaclaw/tools/runtime_context.ts` | ★ 扩展 `NetaToolRuntimeContext` 加 `bizContext: NetaToolRuntimeBizContext` + `currentAgent: NetaToolRuntimeCurrentAgent`;`injectToolRuntimeContext` 校验 JSON-safe | | `packages/backend/src/modules/netaclaw/tools/catalog.ts` | ★ `import './builtin/weixin_send_text.js'` + 加 `TOOLSET_WEIXIN_DESKTOP` 常量 | | `packages/backend/src/entities.ts` | 自动生成,新建 entity 后 `cool entity` 重生(不手改)| | `packages/frontend/src/modules/agent/views/channel-edit.vue` | ★ 加"微信自动回复"区块,**2 个 agent 下拉**(对话 Agent + 桌面操作 Agent)+ watermark + 风控 | | `docs/superpowers/specs/2026-05-09-wechat-uia-channel-design.md` | 顶部 OBSOLETE banner | ### ★ 不动 | 文件 | 原因 | |---|---| | `modules/netaclaw/service/weixin_archive_sync.ts` | 用户明确:聊天记录同步走 DB 不动。内部 `channelLocks` Map 保留(防同 channel 并发归档),**不与 DesktopMutex 合并** | | `modules/netaclaw/runtime/weixin_db/*` | 同上 — 监听链路完全不动 | | `weixin_db.ts` 的 bindChannel / unbindChannel / getRuntime / healthCheck / probeAlive / refreshWhitelist 等读路径方法 | ★ v4 明确:只删 replyToGroup,其他读路径方法保留 | | `agent_channel.ts` 的 routeInboundMessage / handleInboundMessage 主体逻辑 | 不动主框架,只在 weixin-db 分支删自动发送块 + delete 时加 cascade + 注入 bizContext | | `tools/builtin/delegate_task.ts` + `service/subagent.ts` 主体 | NetaClaw 现有 subagent 机制完全复用,**不增强 delegate_task 协议**(只增强 bizContext 继承) | --- ## Phase 0 · PoC 验真 ✅ 已通过 ### Task 0.1 ✅ 已完成 **Status:** Phase 0 PoC 已在 2026-05-14 实跑通过(100% 1 次成功,核心链路打通)。 **已 commit 产出:** - `tools/visual_agent_probe/run-once.ts`(独立脚本,无 IoC) - `tools/visual_agent_probe/README.md` - `tools/visual_agent_probe/check-wechat.ps1` - `docs/superpowers/followups/2026-05-14-visual-agent-poc-raw.json`(raw 报告) - `tools/visual_agent_probe/debug/*.png`(PoC 截图) **实测发现的坑(已反映到 Phase A 各 Task):** - 微信主窗口 title 是中文 "微信"(英文 "Weixin" 是子窗口)→ `Task 6b` 改用 node-screenshots `appName==='Weixin'` 取最大,不用 `FindWindowW` - `node-screenshots` Image 缓存,两次截图同一字节 → `Task 6a` 必须每次重新 enumerate - `clipboardy` v5 是 ESM,CJS require 失败 → `Task 6c` 改 `child_process.spawnSync('clip.exe', { input: utf16leBomBuf })` - 微信 Ctrl+F 全局搜索首项常是"公众号"非目标对话 → `WeixinAdapter` MVP 要求用户手动定位 ### Task 0.2: 跑 N=20 稳定性验证(可选) **Files:** - Append: `docs/superpowers/followups/2026-05-14-visual-agent-poc-raw.json` - [ ] **Step 1:** 跑 `pnpm exec tsx tools/visual_agent_probe/run-once.ts 20`(N=20),记录成功率 - [ ] **Step 2:** 把 20 次的 raw VLM 输出剪出来,**Phase B Task 3 fixtures 直接用** - [ ] **Step 3:** 报告 ≥ 80%(预期 ≥ 90%)。若失败率高,在 followup 报告里记失败模式分类(导航失败 / VLM 看不到 / Enter 没生效 / etc) (此 task 不阻塞 Phase A,可在 Phase A 实施期间并行做。) --- ## Phase 0.5 · Subagent IPC 验证 PoC(★ v4 新增,< 30 min) **目的**:验证 NetaClaw subagent 调用 tool 时,tool execute 跑在 parent process 还是 subagent process,以决定 DesktopMutex 是单进程实例还是要跨进程方案。 ### Task 0.5.1: 临时 `_debug_pid` tool **Files:** - Create(临时,验证后删): `packages/backend/src/modules/netaclaw/tools/builtin/_debug_pid.ts` - Modify: `tools/catalog.ts` 临时 import - [ ] **Step 1:** 写一个最简 tool: ```ts import { Type } from '@sinclair/typebox'; import { AgentToolWithMeta, textResult } from '../common.js'; import { registerSchema } from '../catalog.js'; export const debugPidTool: AgentToolWithMeta = { name: '_debug_pid', label: 'Debug PID', description: 'Return current process pid for IPC verification.', parameters: Type.Object({}), async execute() { return textResult(JSON.stringify({ pid: process.pid, argv: process.argv })); }, }; registerSchema({ name: '_debug_pid', toolset: 'debug', description: 'debug', visibility: 'tool', isCore: false, canDisable: true }); ``` - [ ] **Step 2:** 后端管理后台: - 创建 reply agent A(toolset = `crew` + `interaction`) - 创建 desktop agent B(toolset = `debug` + `interaction`) - 在 chat 里让 A 用 `delegate_task({mode:'preset', agentId: B.id, goal:'调用 _debug_pid'})` - [ ] **Step 3:** 看 _debug_pid 返回的 pid。同时 console 打印 backend main pid。 - **等于** → tool 在 parent process(IPC proxy 模式) → DesktopMutex 单实例 OK,直接进 Phase A - **不等于** → tool 在 subagent process → DesktopMutex 要改跨进程方案(file lock / 共享 SQLite),修改 Task 5 设计 - [ ] **Step 4:** 删除 _debug_pid.ts + catalog.ts 临时 import,记录结论到 `docs/superpowers/followups/2026-05-14-subagent-ipc-poc.md` - [ ] **Step 5:** 不 commit(临时验证,产物只剩 followup 报告 + Phase A 选择) --- ## Phase A · 基础工具(可跨平台单测) ### Task 1: 依赖 + DPI Aware **Files:** - Modify: `packages/backend/package.json` - Create: `modules/desktop_op/runtime/dpi.ts` - Modify: `packages/backend/src/configuration.ts` - [ ] **Step 1:** `cd packages/backend && pnpm add node-screenshots @nut-tree-fork/nut-js koffi` (★ **不加 clipboardy** — v5 是 ESM,改用 child_process spawn clip.exe) - [ ] **Step 2:** 写 `dpi.ts`:`ensureDpiAware()` 调 `SetProcessDpiAwarenessContext(-4)`,platform !== 'win32' no-op,失败不抛错 - [ ] **Step 3:** `configuration.ts onReady` 调 - [ ] **Step 4:** Commit ``` git commit -m "feat(desktop-op): deps + DPI Aware bootstrap" ``` --- ### Task 2: types.ts(★ v3 通用化) **Files:** - Create: `modules/desktop_op/runtime/types.ts` - [ ] **Step 1:** 写: ```ts export interface WindowHandle { hwnd: number; pid: number; appName: string; title: string; bounds: { x: number; y: number; width: number; height: number }; nsWindow: any; // node-screenshots Window 实例引用, 截图用 } export interface DesktopTask { id: string; appId: string; // 'weixin' / 'excel' / ... target: any; // adapter 自定义,e.g. { conversation, channelId, roomName } actionType: string; // e.g. 'send-text' params: any; // e.g. { text, originalText } modelChannelId?: number; // 不填则用 desktop_op_config.default maxSteps?: number; // 默认 8 enqueuedAt: number; } export type ActionStep = | { type: 'click'; x: number; y: number; thought?: string } | { type: 'hotkey'; key: string; thought?: string } | { type: 'clipboard-write'; text: string } // 写剪贴板, 不按键 | { type: 'type'; text: string; thought?: string } // = clipboard-write + ctrl+v | { type: 'wait'; ms: number } | { type: 'mention'; wxid: string } // 留口 | { type: 'finished'; thought?: string } | { type: 'failed'; reason: string; thought?: string }; export interface TaskResult { ok: boolean; modelCalls: number; steps: number; durationMs: number; } export interface AdapterContext { window: WindowHandle; screenshot: any; // Screenshooter input: any; // InputController vlm: any; // VlmClient parser: any; // Parser logger: any; task: DesktopTask; modelCalls: number; } ``` - [ ] **Step 2:** Commit ``` git commit -m "feat(desktop-op): types(DesktopTask / ActionStep / AdapterContext)" ``` --- ### Task 3: Parser interface + JsonActionParser(TDD + fixtures) **Files:** - Create: `modules/desktop_op/runtime/parser/parser.ts` - Create: `modules/desktop_op/runtime/parser/json_action_parser.ts` - Create: `modules/desktop_op/runtime/parser/registry.ts` - Create: `test/.../parser/json_action_parser.test.ts` - Create: `test/fixtures/desktop_op/vlm_responses/*.txt`(★ 来自 PoC `2026-05-14-visual-agent-poc-raw.json` + Task 0.2 N=20) > v2 默认 UI-TARS DSL Parser,v3 改 **JsonActionParser**(PoC 实测 Seed 2.0 Pro 输出 JSON 可解析,UI-TARS DSL Parser 留 Layer 2 加)。 - [ ] **Step 1:** 从 PoC raw 报告抽 ≥ 20 条 VLM 输出(success / failed / 边界),命名 `success-N.txt` / `failed-N.txt` / `ambiguous-N.txt` - [ ] **Step 2:** 写 `parser.ts`: ```ts import type { ActionStep, DesktopTask, AdapterContext } from '../types.js'; export interface Parser { buildSystemPrompt(task: DesktopTask, ctx?: AdapterContext): string; parseAction(raw: string): ActionStep; buildVerifyPrompt(question: string): string; parseVerify(raw: string): boolean; } ``` - [ ] **Step 3:** 写 `json_action_parser.ts`: - parseAction:容错 markdown 代码块、单行尾 JSON、`{ "type": "click", "x": ..., "y": ..., "reason": ... }` 等格式 - parseVerify:模型可能输出 `{"type":"finished",...}` 或自然语言含 yes/no → 都识别为 true/false - 解析失败一律返回 `{ type: 'failed', reason: 'parse-error: ' }` - [ ] **Step 4:** 写测试,每个 fixture 一个 case: ```ts for (const file of fs.readdirSync(fixturesDir)) { it(file, () => { const a = parser.parseAction(fs.readFileSync(...)); if (file.startsWith('success-')) expect(['click','hotkey','type','finished','wait']).toContain(a.type); else if (file.startsWith('failed-')) expect(a.type).toBe('failed'); }); } ``` - [ ] **Step 5:** 写 `registry.ts`: ```ts const PARSERS = new Map([ ['volcengine', new JsonActionParser()], // doubao-seed-2-0-pro / Seed-1.6-vision ['volces-uitars', new JsonActionParser()], // 兼容,UI-TARS Layer 2 再加 UITarsParser ]); export function getParser(supplierOrProvider: string): Parser { return PARSERS.get(supplierOrProvider) ?? new JsonActionParser(); // 默认 fallback } ``` - [ ] **Step 6:** 跑测试 - [ ] **Step 7:** Commit ``` git commit -m "feat(desktop-op): Parser interface + JsonActionParser + fixtures" ``` --- ### Task 4: rate_limiter.ts(TDD) **Files:** - Create: `modules/desktop_op/runtime/rate_limiter.ts` - Create: 测试 - [ ] **Step 1:** 写测试:per-target / per-app / daily 三维度 - [ ] **Step 2:** 实现 in-memory token bucket: ```ts class RateLimiter { tryAcquire(appId: string, targetKey: string, opts: { perTargetPerMin?: number; perAppPerMin?: number; perAppPerDay?: number; }): { allowed: boolean; reason?: string }; } ``` - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): rate_limiter(per-app / per-target / daily)" ``` --- ### Task 5: desktop_mutex.ts(★ v3 全局键鼠锁,v4 单进程默认 — 待 Phase 0.5 PoC 确认) **Files:** - Create: `modules/desktop_op/runtime/desktop_mutex.ts` - Create: 测试 > ⚠️ 与 v2 不同:**不再叫 WeixinChannelMutex,不再 per-channel,改为全局键鼠锁**。理由:系统只有一对键鼠/一块屏幕,任意时刻一个 task 占前台。 > ⚠️ `weixin_archive_sync.ts` **不动** — 它继续用自己的 channelLocks Map(只读 SQLite,与桌面锁无关)。 - [ ] **Step 1:** 写测试: - acquire 多个 task → 严格串行 - release 后下一个 waiter 立即获得 - acquire 后 abort → 不阻塞后续 waiter - [ ] **Step 2:** 实现 `DesktopMutex`(@Provide + @Scope Singleton): ```ts class DesktopMutex { acquire(taskId: string, appId: string): Promise<() => void>; } ``` 内部 `busy: {taskId, appId} | null` + `waiters: Array<{resolve, taskId, appId}>`,见 spec §3.5。 - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): DesktopMutex 全局键鼠锁" ``` --- ### Task 5.5: safety_guard.ts(★ v3 新增) **Files:** - Create: `modules/desktop_op/runtime/safety_guard.ts` - Create: 测试 - [ ] **Step 1:** 写测试: - validateAppId('weixin') 通过,validateAppId('cmd') 抛 `app-not-allowed` - validateAction({type:'hotkey',key:'delete'}) 抛 `dangerous-key-blocked` - validateAction({type:'hotkey',key:'win+r'}) 抛 - validateAction({type:'hotkey',key:'ctrl+v'}) 通过 - loadConfig 后白名单更新 - [ ] **Step 2:** 实现 `SafetyGuard`(见 spec §3.6): - 默认 `allowedApps = ['weixin']` - 默认 `dangerousKeys = ['delete','win+r','alt+f4','win+l','ctrl+alt+delete','ctrl+shift+esc']` - `validateAppId / validateAction / validateTaskShape / loadConfig` - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): SafetyGuard(白名单 + 危险按键黑名单)" ``` --- ### Task 6a: screenshot.ts(★ v3 修订:每次 enumerate) **Files:** - Create: `modules/desktop_op/runtime/screenshot.ts` > PoC 实测发现 `Window.captureImageSync` 在同一 Window 实例上连续调返回**完全相同字节数**的 PNG(415269 bytes,两次完全一致),证明缓存。**必须每次重新 `Window.all()` 拿新 Window 实例**。 - [ ] **Step 1:** 实现: ```ts class NodeScreenshooter { /** ★ 每次调都重新 enumerate, 不缓存 Window 实例 */ captureWindowByAppName(appName: string, opts?: { skipMinimized?: boolean; largest?: boolean }): Buffer; captureFullScreen(): Buffer; } ``` - 内部用 `node-screenshots.Window.all()` 过滤 `appName === target.appName && !isMinimized`,按 area DESC 取首个 - `image.toPngSync()` 返回 PNG bytes - platform !== 'win32' 抛 `unsupported-platform` - [ ] **Step 2:** 不写单测(原生模块,Phase H E2E 验) - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): NodeScreenshooter(每次 enumerate, 避缓存)" ``` --- ### Task 6b: window_locator.ts(★ v3 修订:不用 FindWindowW) **Files:** - Create: `modules/desktop_op/runtime/window_locator.ts` > PoC 实测 `FindWindowW(null, 'Weixin')` 返回的是子窗口 hwnd=4131796(标题 'Weixin' 的子窗口),而**真正的主窗口 hwnd=135372 标题是中文 '微信'**。改用 node-screenshots 枚举 + appName 过滤。 > 改用 koffi 主要是 activate(SetForegroundWindow)+ ShowWindow + 检测前台窗口。 - [ ] **Step 1:** 实现: ```ts class WindowLocator { /** node-screenshots 枚举, 过滤 appName + 非最小化 + 面积最大 */ findByAppName(appName: string, opts?: { skipMinimized?: boolean; largest?: boolean }): WindowHandle | null; /** koffi SetForegroundWindow + ShowWindow(SW_RESTORE) + AttachThreadInput 兜底 */ activate(handle: WindowHandle): Promise; /** koffi GetForegroundWindow + GetWindowThreadProcessId 比对 pid */ isForeground(handle: WindowHandle): boolean; } ``` - [ ] **Step 2:** Commit ``` git commit -m "feat(desktop-op): WindowLocator(node-screenshots enumerate + koffi activate)" ``` --- ### Task 6c: input.ts(★ v3 修订:clip.exe 代替 clipboardy) **Files:** - Create: `modules/desktop_op/runtime/input.ts` - Create: 测试 > PoC 实测 clipboardy v5 是 ESM 包,backend CJS `require` 失败:`ERR_PACKAGE_PATH_NOT_EXPORTED`。改用 Windows 自带 `clip.exe`(child_process spawnSync,中文要 UTF-16 LE + BOM 写入 stdin)。 - [ ] **Step 1:** 写测试: - mock `child_process.spawnSync` 断言 input buffer 含 UTF-16 BOM + UTF-16LE 编码的文本 - mock nut.js,断言 `typeViaClipboard('你好')` 先 spawn clip.exe 再调 `hotkey('ctrl+v')` - `hotkey('ctrl+v')` 调 keyboard.pressKey/releaseKey 顺序正确 - [ ] **Step 2:** 实现: ```ts import { mouse, keyboard, Key, Point } from '@nut-tree-fork/nut-js'; import { spawnSync } from 'node:child_process'; class InputController { async click(x, y) { ... } async hotkey(combo) { ... } // 'ctrl+v' / 'enter' / 'ctrl+f' / 'ctrl+1' /** ★ 用 clip.exe 写, 而不是 clipboardy. 中文必须 UTF-16 LE + BOM. */ writeClipboard(text: string): void { const buf = Buffer.concat([Buffer.from([0xff, 0xfe]), Buffer.from(text, 'utf16le')]); const r = spawnSync('clip.exe', [], { input: buf }); if (r.status !== 0) throw new Error('clip.exe failed: ' + r.stderr?.toString()); } async typeViaClipboard(text: string): Promise { this.writeClipboard(text); await this.hotkey('ctrl+v'); } } ``` - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): InputController(nut.js + clip.exe 中文剪贴板)" ``` --- ### Task 7: action_executor.ts **Files:** - Create: `modules/desktop_op/runtime/action_executor.ts` - Create: 测试 - [ ] **Step 1:** 写测试 — mock InputController,每种 ActionStep 调对应方法: - `click` → input.click(x,y) - `hotkey` → input.hotkey(key) - `clipboard-write` → input.writeClipboard(text) - `type` → input.typeViaClipboard(text) - `wait` → sleep(ms) - `finished` / `failed` → 无副作用 - [ ] **Step 2:** 实现 `execute(step: ActionStep): Promise`,switch 各类型 - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): ActionExecutor" ``` --- ## Phase B · AppAdapter 框架 + WeixinAdapter(★ v3 新增) ### Task 8: AppAdapter interface + registry(TDD) **Files:** - Create: `modules/desktop_op/runtime/adapters/adapter.ts` - Create: `modules/desktop_op/runtime/adapters/registry.ts` - Create: 测试 - [ ] **Step 1:** 写 `adapter.ts`(spec §5.2 已定): ```ts export interface AppAdapter { appId: string; supportedActions: string[]; findWindow(target: any): Promise; preFlightCheck(task: DesktopTask, ctx: AdapterContext): Promise; buildSteps(task: DesktopTask): Promise; verifyResult(task: DesktopTask, ctx: AdapterContext): Promise; queueKey(target: any): string; } ``` - [ ] **Step 2:** 写 `registry.ts`: ```ts @Provide() @Scope(ScopeEnum.Singleton) class AdapterRegistry { private map = new Map(); register(adapter: AppAdapter): void; get(appId: string): AppAdapter; // 找不到抛 'app-not-allowed' listAppIds(): string[]; } ``` - [ ] **Step 3:** 写测试 — register / get / 不存在抛错 / listAppIds - [ ] **Step 4:** Commit ``` git commit -m "feat(desktop-op): AppAdapter interface + AdapterRegistry" ``` --- ### Task 8.5: 扩展 NetaToolRuntimeContext + 注入 bizContext + currentAgent(★ v4 新增,含 D1 修订) **Files:** - Modify: `packages/backend/src/modules/netaclaw/tools/runtime_context.ts` - Modify: `packages/backend/src/modules/netaclaw/runtime/agent.ts`(`AgentRunParams` 加 `runtime?: NetaToolRuntimeContext` 字段,beforeToolCall 注入) - Modify: `packages/backend/src/modules/netaclaw/service/agent_executor.ts`(透传 runtime 到 agentRunner + injectToolRuntimeContext) - Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts`(reply agent run 前注入 bizContext + currentAgent) - Modify: `packages/backend/src/modules/netaclaw/service/subagent.ts`(runPreparedExecution 时继承 parent runtime.bizContext + 替换 currentAgent 为 subagent 自己) - Modify: `packages/backend/src/modules/netaclaw/subagent/process_runner.ts`(subprocess 模式 IPC envelope 透传 runtime;若 JSON-stringify 失败 throw 'biz-context-not-serializable') - Create: 测试 - [ ] **Step 1:** 改 `runtime_context.ts`: ```ts export interface NetaToolRuntimeBizContext { channelId?: number; roomName?: string; // 限制 JSON-safe(primitive / array / plain object) [k: string]: string | number | boolean | null | undefined | object | any[]; } export interface NetaToolRuntimeCurrentAgent { id: number; name: string; modelChannelId: number | null; toolsets: string[]; } export interface NetaToolRuntimeContext { sessionCwd?: string | null; workspaceRoots?: string[]; bizContext?: NetaToolRuntimeBizContext; // ★ v4 新增 currentAgent?: NetaToolRuntimeCurrentAgent; // ★ v4 新增 } export function injectToolRuntimeContext( args: T, runtime: NetaToolRuntimeContext | undefined, ): T & { _netaRuntime?: NetaToolRuntimeContext } { // 原逻辑 + 加 bizContext / currentAgent 透传 // ★ 校验 JSON.stringify(runtime) 必须成功(防嵌函数/循环引用),否则 throw 'biz-context-not-serializable' } export function readToolRuntimeContext(params): NetaToolRuntimeContext | undefined { // 原逻辑 + 解析 bizContext / currentAgent } ``` - [ ] **Step 2:** 改 `runtime/agent.ts`:`AgentRunParams` 加 optional `runtime?: NetaToolRuntimeContext` 字段(不破坏现有调用);runAgent 内部把 runtime 透传到 beforeToolCall - [ ] **Step 3:** 改 `agent_executor.ts:beforeToolCall`(行 290-293):把 runtime 参数直接传给 `injectToolRuntimeContext(args, params.runtime ?? {sessionCwd, workspaceRoots})` - [ ] **Step 4:** 改 `agent_channel.ts:handleInboundMessage` 在 agent_executor.run 调用前注入: ```ts const replyAgent = await this.agentService.info(effectiveAgentId); const bizContext = { channelId: channel.id, roomName: (group as any).roomName ?? scope.chatId }; const currentAgent = { id: replyAgent.id, name: replyAgent.name, modelChannelId: replyAgent.modelConfig?.modelId ? null : null, toolsets: replyAgent.toolsets ?? [] }; // 见 agent.ts entity 字段 // 传给 agent_executor.run({..., runtime: { bizContext, currentAgent }}) ``` (注:modelChannelId 字段在 NetaClawAgentEntity 里实际通过 modelConfig / auxiliaryModelChannelId 等映射,实施时按 agent_executor 现有解析逻辑取真实值) - [ ] **Step 5:** 改 `subagent.ts:runPreparedExecution`,把 ctx 传入的 parent runtime 继承下去,**但替换 currentAgent 为 subagent entity 自己的**: ```ts const subagentRuntime = { ...ctx.parentRuntime, // parent 的 bizContext / sessionCwd / workspaceRoots 全继承 currentAgent: { id: subagentEntity.id, name: subagentEntity.name, modelChannelId: ..., toolsets: subagentEntity.toolsets ?? [] }, // 替换 }; // in_process 模式:agentRunner({..., runtime: subagentRuntime}) // subprocess 模式:envelope.runtime = subagentRuntime,IPC 序列化 ``` `NetaClawSubagentRunSingleContext` 加 optional `parentRuntime?: NetaToolRuntimeContext` 字段;`delegate_task.ts` 在 mode='session-subagent' 分支调用 `ctx.runSingle` 时把当前 parent runtime 透传进去(需扩展 `SessionDelegateToolContext` 或 `runSingle` 签名) - [ ] **Step 6:** 改 `subagent/process_runner.ts`:`SubagentRunRequest` envelope 加 `runtime?: NetaToolRuntimeContext` 字段;调用前 `JSON.stringify(runtime)` 校验通过再发(失败 throw 'biz-context-not-serializable');worker 端接收后 attach 到 agent_executor 内部的 runtime - [ ] **Step 7:** 写测试: - inject/read 往返(含 bizContext 和 currentAgent) - JSON-unsafe bizContext(嵌函数 / 循环引用)→ throw 'biz-context-not-serializable' - subagent in_process 模式继承 parent bizContext + 替换 currentAgent - subagent subprocess 模式 IPC envelope 含 runtime - [ ] **Step 8:** Commit ``` git commit -m "feat(netaclaw): NetaToolRuntimeContext 加 bizContext + currentAgent,reply agent → subagent → tool 透传" ``` --- ### Task 9: WeixinAdapter(MVP 唯一实现,TDD) **Files:** - Create: `modules/desktop_op/runtime/adapters/weixin_adapter.ts` - Create: 测试(mock screenshot / VLM) - [ ] **Step 1:** 写测试: - findWindow:mock windowLocator.findByAppName('Weixin',{largest:true}) 返回 handle → adapter 直接透传 - preFlightCheck happy:VLM 返回"当前对话是 文件传输助手"→ pass - preFlightCheck 失败:VLM 返回 "当前对话是 公众号" → 抛 `precondition-failed` - buildSteps('send-text', { text: '你好' }):返回 `[clipboard-write, hotkey:ctrl+v, wait, hotkey:enter, wait]` 共 5 步 - verifyResult happy:VLM 看到最新消息含 text → true - queueKey({conversation:'文件传输助手'}):返回 `'文件传输助手'` - [ ] **Step 2:** 实现(spec §3.3): ```ts @Provide() @Scope(ScopeEnum.Singleton) class WeixinAdapter implements AppAdapter { appId = 'weixin'; supportedActions = ['send-text']; @Inject() windowLocator: WindowLocator; // findWindow / preFlightCheck / buildSteps / verifyResult / queueKey } ``` preFlightCheck 调 ctx.screenshot.captureWindowByAppName + ctx.vlm.verifyState 问 "当前微信打开的聊天顶部标题是不是 '${task.target.conversation}'?" verifyResult 调 ctx.vlm.verifyState 问 "右下角最新一条己方消息是否包含 '${text.slice(0,50)}'?" - [ ] **Step 3:** 在 configuration.ts onReady 调 `adapterRegistry.register(new WeixinAdapter(...))`(或用 Midway @AutoLoad) - [ ] **Step 4:** Commit ``` git commit -m "feat(desktop-op): WeixinAdapter(MVP 唯一实现,send-text)" ``` --- ## Phase C · VLM 客户端 + model_channel 集成 ### Task 10: vlm_client.ts(TDD,凭据走 model_channel) **Files:** - Create: `runtime/visual_agent/vlm_client.ts` - Create: 测试(mock fetch + mock modelChannelService) - [ ] **Step 1:** 写测试: - mock modelChannelService.findOne(modelChannelId) 返回 { baseUrl, apiKey, modelName, providerType } - mock fetch 返回 fixture response(从 Phase 0 PoC 录制) - 断言 request body 含: - messages: 含 image_url base64 (data:image/png;base64,...) - model: 等于 modelChannel.modelName - max_tokens / temperature 合理 - history 只传"上一张 + 最新一张"截图(★ A4 修订) - [ ] **Step 2:** 实现: ```ts @Provide() @Scope(ScopeEnum.Singleton) export class VlmClient { @Inject() modelChannelService: NetaClawModelChannelService; async nextAction(ctx: TaskContext, parser: Parser, screenshot: Buffer, history: HistoryEntry[]): Promise<{ action: ActionStep; tokensUsed: number }>; async verifyState(ctx: TaskContext, parser: Parser, screenshot: Buffer, question: string): Promise<{ result: boolean; tokensUsed: number }>; } ``` 内部用 `openai` npm 包,baseURL 走 modelChannel.baseUrl,model 字段用 modelChannel.modelName,system prompt 走 parser.buildSystemPrompt。 - [ ] **Step 3:** 跑测试 - [ ] **Step 4:** Commit ``` git commit -m "feat(visual-agent): VlmClient(走 model_channel + history 截断)" ``` --- ## Phase D · Runtime + Service + 持久化(★ v3 改名) ### Task 11: DesktopOpRuntime.runTask(TDD,adapter 主导) **Files:** - Create: `modules/desktop_op/runtime/runtime.ts` - Create: 测试 - [ ] **Step 1:** 写测试,覆盖(spec §3.3): - happy path:safetyGuard 通过 → adapter.findWindow → activate → preFlightCheck → buildSteps 返回 5 步 → 每步 actionExecutor.execute → verifyResult=true → 返回 ok - safetyGuard.validateAppId 抛 'app-not-allowed' → runTask 透传 - safetyGuard.validateAction 抛 'dangerous-key-blocked' → 中断 + 透传 - adapter.findWindow 返回 null → 抛 'window-not-found' - adapter.preFlightCheck 抛 → 透传 - AbortSignal 在 Step 2 中触发 → 立即抛 AbortError,不再执行后续 - adapter.verifyResult 返回 false → 抛 'verify-failed' - [ ] **Step 2:** 实现(spec §3.3 骨架): ```ts @Provide() @Scope(ScopeEnum.Singleton) export class DesktopOpRuntime { @Inject() safetyGuard: SafetyGuard; @Inject() adapterRegistry: AdapterRegistry; @Inject() windowLocator: WindowLocator; @Inject() screenshot: NodeScreenshooter; @Inject() input: InputController; @Inject() vlm: VlmClient; @Inject() parserRegistry: ParserRegistry; @Inject() actionExecutor: ActionExecutor; async runTask(task: DesktopTask, abort: AbortSignal): Promise; } ``` - [ ] **Step 3:** 跑测试 - [ ] **Step 4:** Commit ``` git commit -m "feat(desktop-op): DesktopOpRuntime(adapter 主导,safety + abort)" ``` --- ### Task 12: DesktopOpService 后台 worker + runAndWait + abortByFilter(★ v4 加 runAndWait,TDD) **Files:** - Create: `modules/desktop_op/service/desktop_op.ts` - Create: 测试 - [ ] **Step 1:** 写测试(spec §3.7 + §3.8): - **runAndWait happy path:** 调用 runAndWait → 内部 enqueue → worker 跑完 → resolve `{ok:true, taskId, modelCalls, steps, durationMs}` - **runAndWait 失败:** runtime 抛 verify-failed → reject Error('verify-failed') - **runAndWait timeout:** runtime 不返回(模拟卡死)→ 60s 后 reject Error('task-timeout') + abortByFilter 取消该 task - **runAndWait 多并发同 target:** 严格串行(后到的等前面跑完才开始) - enqueue(fire-and-forget)兼容老接口 - enqueue 多 task 同 (app, target) → worker 串行 - enqueue 跨 (app, target) → per-key worker 并行 enqueue,但内部走 DesktopMutex 物理串行(mock 验证) - queue > 20 → 丢最老 + log queue-overflow + 若该 task 是 runAndWait reject Error('queue-overflow') - abortByFilter(t => t.appId==='weixin' && t.target.channelId === 5, 'channel-deleted') → 清掉该 channel 的 pending task,中断正在跑的 task,并 reject 对应的 runAndWait Promise(reason='channel-deleted') - DesktopMutex.acquire 被调,且 release 后下一个 task 启动 - [ ] **Step 2:** 实现(spec §3.7 + §3.8 骨架): ```ts @Provide() @Scope(ScopeEnum.Singleton) export class DesktopOpService { @Inject() runtime: DesktopOpRuntime; @Inject() desktopMutex: DesktopMutex; @Inject() adapterRegistry: AdapterRegistry; @InjectEntityModel(DesktopOpActionLogEntity) logRepo: ...; private queues = new Map(); private workers = new Map>(); private aborters = new Map(); /** ★ v4 新增:runAndWait 的 promise resolvers */ private waiters = new Map void; reject: (e: Error) => void; timer: NodeJS.Timeout }>(); enqueue(task: DesktopTask): { taskId: string; queuePosition: number }; /** ★ v4 新增:同步等待版,tool execute 入口 */ async runAndWait(task: DesktopTask, timeoutMs = 60000): Promise<{ ok: true; taskId: string; modelCalls: number; steps: number; durationMs: number; }> { return new Promise((resolve, reject) => { const timer = setTimeout(() => { this.waiters.delete(task.id); this.abortByFilter(t => t.id === task.id, 'task-timeout'); reject(new Error('task-timeout')); }, timeoutMs); this.waiters.set(task.id, { resolve, reject, timer }); this.enqueue(task); }); } abortByFilter(filter, reason): void { // 现有 + 清理 waiters 并 reject(reason) } private async workerLoop(key: string): Promise { // 现有 + 终态时调 waiters.get(task.id)?.resolve / reject + clearTimeout + delete } private queueKey(task: DesktopTask): string; } ``` - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): DesktopOpService(per-key worker + DesktopMutex + abortByFilter)" ``` --- ### Task 13: desktop_op_action_log entity + 持久化 **Files:** - Create: `modules/desktop_op/entity/desktop_op_action_log.ts` - Modify: `packages/backend/src/entities.ts` - Modify: `service/desktop_op.ts` 每个 task 终态写 log - [ ] **Step 1:** 写 entity(spec §4.3 schema)— task_id / app_id / target_json / action_type / final_text / channel_id (微信场景) / model_calls / status / aborted_reason / etc - [ ] **Step 2:** 在 DesktopOpService 注入 logRepo,workerLoop 每次结束(success / 各错误 token)写 log - [ ] **Step 3:** 加测试断言失败路径也写 log - [ ] **Step 4:** Commit ``` git commit -m "feat(desktop-op): DesktopOpActionLog entity + service 落库" ``` --- ### Task 13.5: desktop_op_config entity + 默认行加载(★ v3 新增) **Files:** - Create: `modules/desktop_op/entity/desktop_op_config.ts` - Modify: `packages/backend/src/entities.ts` - Modify: `packages/backend/src/configuration.ts`(onReady 时若表为空插入默认行 + 调 safetyGuard.loadConfig) - [ ] **Step 1:** 写 entity(spec §4.1):default_model_channel_id / allowed_apps(JSON)/ extra_dangerous_keys / global_per_min / global_per_day / default_watermark - [ ] **Step 2:** configuration.ts onReady: ```ts const cfg = await dataSource.getRepository(DesktopOpConfigEntity).findOne({ where: { id: 1 } }); if (!cfg) { await ...save({ id: 1, allowedApps: ['weixin'], defaultWatermark: 'suffix', globalPerMin: 30, globalPerDay: 1000 }); } await safetyGuard.loadConfig(cfg ?? defaults); ``` - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): DesktopOpConfig entity + 默认行 + safetyGuard 加载" ``` --- ## Phase E · 业务接入(★ v4 重写:tool 化 + 删除 helper / replyToGroup / 自动发送块) ### Task 14: weixin_send_text tool 实现(★ v4 替代 v3 helper,TDD) **Files:** - Create: `packages/backend/src/modules/netaclaw/tools/builtin/weixin_send_text.ts` - Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts`(注册 + TOOLSET 常量) - Create: 测试 - [ ] **Step 1:** 写测试(mock DesktopOpService + channelRepo): - 入参缺失:text 空 → 抛 'text-empty' - text.length > 2000 → 抛 'text-too-long' - channelId 未传且 bizContext 也无 → 抛 'invalid-params: channelId 缺失' - currentAgent.modelChannelId 未注入 → 抛 'current-agent-model-channel-missing' - 从 `_netaRuntime.bizContext` 取 channelId,优先于显式 params(测试两条路径都通) - 从 `_netaRuntime.currentAgent.modelChannelId` 取 modelChannelId - watermark='suffix' → finalText 加 ' —AI' - watermark='zero-width' → finalText 前缀 U+200B - channel.config.weixinReply.enabled=false → 抛 'weixin-reply-not-enabled' - 调用 `desktopOpService.runAndWait` 入参正确(appId='weixin' / target.channelId/roomName / actionType='send-text' / modelChannelId 来自 desktop agent 的 currentAgent.modelChannelId) - runAndWait 成功 → tool 返回 textResult 含 "已发送" + taskId - runAndWait 失败(verify-failed / window-not-found / precondition-failed / task-timeout)→ tool throw 让 agent 看到 - [ ] **Step 2:** 实现(★ v4 改:modelChannelId 从 bizContext.currentAgent 取,不再用工厂函数闭包): ```ts import { Type } from '@sinclair/typebox'; import { AgentToolWithMeta, textResult } from '../common.js'; import { registerSchema, TOOLSET_WEIXIN_DESKTOP } from '../catalog.js'; import { readToolRuntimeContext } from '../runtime_context.js'; import { randomUUID } from 'node:crypto'; const Params = Type.Object({ roomName: Type.String({ description: '目标群名(必填)' }), text: Type.String({ description: '要发送的文本(必填,长度 1-2000)' }), channelId: Type.Optional(Type.Number({ description: 'NetaClaw channel id,优先级低于 runtime bizContext' })), _netaRuntime: Type.Optional(Type.Any()), // 内部字段,beforeToolCall 注入 }); /** * 微信发文字 tool。 * 通过 NetaToolRuntimeContext 自动拿 channelId/roomName(bizContext)和当前 agent 的 modelChannelId(currentAgent) */ export function createWeixinSendTextTool(deps: { desktopOpService: any; // DesktopOpService channelRepo: any; }): AgentToolWithMeta { return { name: 'weixin_send_text', label: '微信发送文字', description: '在指定微信群里发送一段文字', parameters: Params, async execute(_id, params) { const runtime = readToolRuntimeContext(params as any); const channelId = runtime?.bizContext?.channelId ?? params.channelId; const modelChannelId = runtime?.currentAgent?.modelChannelId; if (!channelId) throw new Error('invalid-params: channelId 缺失(既无 bizContext 也无显式参数)'); if (!modelChannelId) throw new Error('current-agent-model-channel-missing: desktop agent 未配置 modelChannel'); if (!params.text || params.text.length === 0) throw new Error('text-empty'); if (params.text.length > 2000) throw new Error('text-too-long'); const channel = await deps.channelRepo.findOne({ where: { id: channelId } }); const cfg = (channel?.config as any)?.weixinReply; if (!cfg?.enabled) throw new Error('weixin-reply-not-enabled'); const watermark = cfg.watermark ?? 'suffix'; let finalText = params.text; if (watermark === 'suffix') finalText = params.text + ' —AI'; else if (watermark === 'zero-width') finalText = '​' + params.text; const result = await deps.desktopOpService.runAndWait({ id: `cid-${channelId}-${randomUUID()}`, appId: 'weixin', target: { conversation: params.roomName, channelId, roomName: params.roomName }, actionType: 'send-text', params: { text: finalText, originalText: params.text }, modelChannelId, maxSteps: 8, enqueuedAt: Date.now(), }, 60000); return textResult(`已在群 "${params.roomName}" 发送: ${params.text.slice(0, 60)}${params.text.length > 60 ? '...' : ''} (taskId=${result.taskId}, ${result.durationMs}ms)`); }, }; } registerSchema({ name: 'weixin_send_text', toolset: TOOLSET_WEIXIN_DESKTOP, description: '在指定微信群里发送一段文字', visibility: 'tool', capability: 'text', isCore: false, canDisable: true, }); ``` - [ ] **Step 3:** 改 `catalog.ts`: ```ts export const TOOLSET_WEIXIN_DESKTOP = 'weixin_desktop' as const; // ... import './builtin/weixin_send_text.js'; ``` - [ ] **Step 4:** 注册到 tool_resolver(让 tool 能在 desktop agent 调用时被构造,modelChannelId 来自当前 agent)— 参考 `clarify.ts` 或 `delegate_task.ts` 的注册方式,改 `service/tool_resolver.ts` - [ ] **Step 5:** 跑测试 + 类型检查 - [ ] **Step 6:** Commit ``` git commit -m "feat(netaclaw): weixin_send_text tool(toolset=weixin_desktop,调 DesktopOpService.runAndWait)" ``` --- ### Task 15: 删除 weixin_db.replyToGroup + agent_channel 自动发送块(★ v4 清理) **Files:** - Modify: `packages/backend/src/modules/netaclaw/service/weixin_db.ts` - Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts` - Modify: 对应测试(去掉 replyToGroup mock,补充新断言) - [ ] **Step 1:** `weixin_db.ts`: - **删除** `replyToGroup` 方法(行 166-171) - 修改类注释(行 32 那行 ` - replyToGroup: 占位 throw NotImplementedError(等待 spec 5.7 实施)` 删掉) - 保留所有读路径方法(bindChannel / unbindChannel / getRuntime / healthCheck / probeAlive / refreshWhitelist / currentWhitelistSync / ensureWhitelistLoaded) - [ ] **Step 2:** `agent_channel.ts`: - **删除** weixin-db 自动发送整个分支(当前代码行 584-608 的 `if (channel.type === 'weixin-db') { ... return; }`) - 删除时**不要碰** `iLink (weixin ClawBot)` 分支(行 610 起,这是另一条 channel.type 路径) - 添加注释 `// v4: weixin-db 不再自动发送,reply agent 必须主动 delegate_task 给 desktop agent` - [ ] **Step 3:** `agent_channel.ts:handleInboundMessage` 在 reply agent run 前注入 bizContext + currentAgent(衔接 Task 8.5): ```ts const replyAgent = await this.agentService.info(effectiveAgentId); const runtimeBizContext = { channelId: channel.id, roomName: (group as any).roomName || this.extractGroupName(rawMessage) || scope.chatId, }; const runtimeCurrentAgent = { id: replyAgent.id, name: replyAgent.name, modelChannelId: replyAgent.modelConfig?.modelId ? null : null, // 实际从 agent_executor 现有解析逻辑取 toolsets: replyAgent.toolsets ?? [], }; // 传给 agent_executor.run({..., runtime: { bizContext: runtimeBizContext, currentAgent: runtimeCurrentAgent }}) ``` - [ ] **Step 4:** ★ R3 兜底:**reply agent run 完成后检测是否调过 delegate_task**(防管理员 prompt 配错导致消息黑洞): ```ts const runResult = await this.agentExecutorService.run({..., runtime}); // weixin-db + enabled=true 时,若 toolExecutions 里没出现过 'delegate_task',log warning const isWeixinReply = channel.type === 'weixin-db' && channel.config?.weixinReply?.enabled === true; if (isWeixinReply) { const calledDelegate = runResult.toolExecutions?.some(t => t.name === 'delegate_task'); if (!calledDelegate) { this.logger.warn( '[AgentChannel] WARN reply agent did not call delegate_task (message dropped silently). channelId=%s roomName=%s finalContent="%s..."', channel.id, runtimeBizContext.roomName, String(runResult.finalContent || '').slice(0, 80), ); } } ``` 不阻止流程(reply agent 决定不回复是合法行为),只 log 提示管理员。 - [ ] **Step 5:** 修测试: - 删除原 `weixinDbService.replyToGroup` mock 相关测试 - 新增测试覆盖 "weixin-db channel 收到群消息,reply agent 不被自动调用 replyToGroup" - 新增 bizContext / currentAgent 注入测试 - 新增 "reply agent 没调 delegate_task → log warning" 测试(mock logger) - [ ] **Step 6:** 全量跑 `pnpm --filter @neta/backend test`,确保不破坏 - [ ] **Step 7:** Commit ``` git commit -m "refactor(netaclaw): 删除 weixin_db.replyToGroup 占位 + agent_channel 自动发送块 + reply agent 漏 delegate 检测(v4 双 agent)" ``` --- ### Task 15.5: agent_channel.update 校验双 agent toolset + 自动配 workerRoutingStrategy(★ v4 新增) **Files:** - Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts`(update 方法) - Modify: `packages/backend/src/modules/netaclaw/service/agent.ts`(可能需要补一个 update agent.tools.perTool 的辅助方法) - [ ] **Step 1:** 在 `update(data)` 中,当 `data.type==='weixin-db'` 且 `data.config?.weixinReply?.enabled===true` 时,**串行执行所有校验**(任一失败 throw 阻止保存): - 取 reply agent(`data.agentId`)`agent.toolsets: string[]`,**必须包含** `'crew'`,否则 throw 'reply-agent-missing-crew-toolset' - 校验 `data.config.weixinReply.desktopAgentId` 必填且对应 agent 存在(throw 'desktop-agent-not-found') - 取 desktop agent,**`agent.toolsets` 必须包含 `'weixin_desktop'`**(throw 'desktop-agent-missing-weixin-desktop-toolset') - 取 desktop agent,**`agent.toolsets` 不能包含 `'crew'`**(throw 'desktop-agent-must-not-have-crew-toolset') - 校验 reply agent.id !== desktop agent.id(throw 'reply-and-desktop-cannot-be-same') - [ ] **Step 2:** **校验通过后,自动 patch desktop agent**(若未配置则补齐): ```ts const tools = desktopAgent.tools ?? {}; const perTool = tools.perTool ?? {}; const wxTool = perTool['weixin_send_text'] ?? {}; let dirty = false; if (wxTool.allowInSubagent !== true) { wxTool.allowInSubagent = true; dirty = true; } if (wxTool.workerRoutingStrategy !== 'force-main-process-proxy') { wxTool.workerRoutingStrategy = 'force-main-process-proxy'; dirty = true; } if (dirty) { perTool['weixin_send_text'] = wxTool; await this.agentService.update({ id: desktopAgent.id, tools: { ...tools, perTool } }); this.logger.info('[agent-channel] auto-patched desktop agent %s tools.perTool.weixin_send_text', desktopAgent.id); } ``` 这保证: - 即使管理员忘记勾选 `allowInSubagent`,系统也能让 subagent 调到 weixin_send_text - 即使 subagent 后续切到 subprocess 模式,tool 也会 proxy 回 main process,DesktopMutex 单实例继续有效 - [ ] **Step 3:** **推荐配置(MVP 不强制)**:若 `replyAgent.subagentConfig.allowedPresetAgentIds` 为空,log info 提示 "建议给 reply agent 配 allowedPresetAgentIds=[desktopAgentId] 限定 delegate 目标"。不自动 patch,避免覆盖用户意图。 - [ ] **Step 4:** 写测试覆盖所有失败路径 + 自动 patch 路径(mock agentService.update 断言入参) - [ ] **Step 5:** Commit ``` git commit -m "feat(netaclaw): channel.update 校验双 agent toolset + 自动配 weixin_send_text 的 routing 策略" ``` --- ### Task 16: agent_channel.delete cascade abortByFilter **Files:** - Modify: `packages/backend/src/modules/netaclaw/service/agent_channel.ts` - [ ] **Step 1:** 加 `@Inject() desktopOpService: DesktopOpService;` - [ ] **Step 2:** 在 `delete(ids)` 循环里加: ```ts for (const id of ids) { this.stopRunner(id); await this.groupService.cascadeDeleteByChannel(id); this.weixinDbService.unbindChannel(id); await this.archiveSyncService.deleteChannelArchive(id); this.desktopOpService.abortByFilter( t => t.appId === 'weixin' && t.target?.channelId === id, 'channel-deleted', ); } ``` - [ ] **Step 3:** 修对应 mock 测试,加 `desktopOpService` mock - [ ] **Step 4:** Commit ``` git commit -m "feat(netaclaw): channel.delete cascade abort desktop_op" ``` --- ### Task 17: weixinReply.enabled 关闭时 cascade abort **Files:** - Modify: `modules/netaclaw/service/agent_channel.ts` 的 `update` 方法 - [ ] **Step 1:** 在 `update(data)` 中,检测 channel.config.weixinReply.enabled **由 true 变 false** → 调: ```ts this.desktopOpService.abortByFilter( t => t.appId === 'weixin' && t.target?.channelId === existing.id, 'weixin-reply-disabled', ); ``` - [ ] **Step 2:** Commit ``` git commit -m "feat(netaclaw): channel.config.weixinReply.enabled 变 false 时 cascade abort" ``` --- ## Phase F · 审计 + 配置 controller ### Task 18: desktop_op_action_log controller **Files:** - Create: `modules/desktop_op/controller/admin/desktop_op_action_log.ts` - [ ] **Step 1:** `@CoolController` + `POST /list` + `GET /info`,过滤字段: appId / channelId / status / 时间范围 - [ ] **Step 2:** Commit ``` git commit -m "feat(desktop-op): desktop_op_action_log admin API" ``` --- ### Task 19: desktop_op_config controller(MVP 仅 get/update) **Files:** - Create: `modules/desktop_op/controller/admin/desktop_op_config.ts` - [ ] **Step 1:** `@Get('/info')` 取 id=1 的单行 + `@Post('/update')` 更新 - [ ] **Step 2:** update 后 reload SafetyGuard config - [ ] **Step 3:** Commit ``` git commit -m "feat(desktop-op): desktop_op_config admin API + SafetyGuard reload" ``` --- ## Phase G · 前端 ### Task 20: channel-edit.vue 加微信自动回复区块(★ v4 双 agent 下拉) **Files:** - Modify: `packages/frontend/src/modules/agent/views/channel-edit.vue`(或对应 weixin-db 编辑组件) - [ ] **Step 1:** type=weixin-db 时新增: - 自动回复:radio (`disabled` / `enabled`),默认 disabled - **对话 Agent**(★ v4):下拉绑定 `channel.agentId`(沿用现有字段,这是表单顶部的字段,无需新增控件,但要加 hint "必须启用 crew toolset") - **桌面操作 Agent**(★ v4 新):下拉绑定 `channel.config.weixinReply.desktopAgentId`,数据源 `service.netaclaw.agent.list({})` 前端过滤 toolset 含 `weixin_desktop` - 校验:enabled=true 时 desktopAgentId 必填 + 不能等于 channel.agentId(前端提示) - 小号安全模式:开关(默认开) - 每天上限:数字(默认 100) - 每群每分钟:数字(默认 3) - 消息水印:radio (`none` / `suffix` / `zero-width`),默认 suffix - 风险提示文案 + ★ "桌面操作 Agent 的模型 / prompt / toolset 请在 Agent 管理页配置" - ❌ **不再有 "使用模型" 下拉**(v4 移除 modelChannelId 字段) - [ ] **Step 2:** 提交时塞入 `channel.config.weixinReply = { enabled, desktopAgentId, dailyLimit, perGroupPerMinute, safeMode, watermark }` - [ ] **Step 3:** `pnpm --filter @neta/frontend type-check` 确认我改的文件无新 ts error - [ ] **Step 4:** Commit ``` git commit -m "feat(agent-fe): channel-edit 加微信自动回复区块(双 agent 下拉,channel.config.weixinReply.desktopAgentId)" ``` --- ### Task 20.5: desktop_op_config 设置页(可选,首版用默认值即可) 后续 spec — `Layer 2`。 --- ## Phase H · E2E + 老 spec 收尾 ### Task 21: E2E checklist + 验证报告 **Files:** - Create: `docs/superpowers/followups/2026-05-14-desktop-op-e2e.md` **前置:** - Windows + 微信 4.x 登录 + 测试群 + 测试小号(已养 ≥ 7 天) - backend + frontend 启动 - ★ v4 配置: - 管理后台创建 reply agent A(toolset=`base`+`interaction`+`crew`,modelChannel 选普通 LLM) - 管理后台创建 desktop agent B(toolset=`weixin_desktop`+`interaction`,modelChannel 选 multimodal 火山 Seed-2.0-pro,prompt 用默认模板) - 编辑 weixin-db channel:enabled=true,对话 Agent=A,桌面操作 Agent=B,watermark=suffix - `desktop_op_config` 表已有默认行(`allowed_apps:['weixin']`) - Phase 0.5 IPC PoC 已完成且结论已应用到 Task 5 实现 **Checklist:** - [ ] **E2E-1:** Phase 0 PoC ✅(已通过 100% 1 次) - [ ] **E2E-1.5:** Phase 0.5 Subagent IPC PoC 已完成,结论记录在 followup - [ ] **E2E-2:** 跑 N=20 Task 0.2 收集 fixtures + 验证 ≥ 80% 成功率(若未做) - [ ] **E2E-3:** 配置:reply agent A(crew toolset)+ desktop agent B(weixin_desktop toolset,不含 crew)+ channel 绑两个 agent + enabled=true + watermark=suffix - [ ] **E2E-3.5:** ★ 验证后端 channel.update 校验: - 给 reply agent 去掉 crew toolset 后保存 → 报错 "reply-agent-missing-crew-toolset" - 给 desktop agent 加上 crew toolset 后保存 → 报错 "desktop-agent-invalid-toolset" - reply agent 与 desktop agent 选同一个 → 前端报错 - [ ] **E2E-4:** ★ **核心双 agent 链路**:在测试群发问题(如"今天天气如何")→ - reply agent 收到 db 触发的 onInbound 后 ReAct - reply agent 调 `delegate_task({mode:'preset', agentId:B.id, goal:'在群 X 发送: 阴 12-18 度'})` - desktop agent 启 subagent process,ReAct 后调 `weixin_send_text({roomName:'X', text:'阴 12-18 度'})` - tool 调用 DesktopOpService.runAndWait,desktop op 完成桌面键鼠 + VLM 验证 - 5-40s 内群里收到 "阴 12-18 度 —AI" - subagent_session 表新增一条 desktop agent 会话 + desktop_op_action_log 新增一条 - [ ] **E2E-4.1:** ★ reply agent 决定不回复:发"[请忽略]xxx",reply agent prompt 教它跳过 → 群里**无任何回复**,desktop_op_action_log **无新增**(因为 tool 没被调) - [ ] **E2E-4.2:** ★ bizContext 透传验证:`weixin_send_text` tool 内部 log 出 `channelId` 来源是 `bizContext`(而非 LLM 在 params 显式传) - [ ] **E2E-5:** 重复 5 次幂等(都成功) - [ ] **E2E-6:** 故意把微信最小化 → desktop_op activate 自动恢复并发送 - [ ] **E2E-7:** 每分钟连发 5 次 → 第 4/5 次 rate-limited,desktop_op_action_log 显示 status=rate-limited - [ ] **E2E-8:** 在管理后台把 channel.config.weixinReply.enabled 切 false → 队列中的 pending task 显示 aborted_reason=weixin-reply-disabled(via Task 17);新消息进来,reply agent 仍能 run 但 `weixin_send_text` tool 抛 'weixin-reply-not-enabled' - [ ] **E2E-9:** 删除 channel → 该 channel 的 pending task 全 aborted_reason=channel-deleted - [ ] **E2E-10:** 关闭微信进程 → desktop agent 收到 tool 抛 'window-not-found',按 prompt 决定不重试,reply agent 拿到失败结果 - [ ] **E2E-11:** 查 desktop_op_action_log 表:每条调用 1 row,final_text 全文落库,target_json 含 channelId/roomName - [ ] **E2E-12:** 模拟模型异常(临时改 desktop agent 的 modelChannel.baseUrl 错误 URL)→ tool 抛 model-failed,desktop agent 重试 1 次,reply agent 收到失败 - [ ] **E2E-13:** archive sync 与 desktop_op 同时触发 → archive sync 走自己 channelLocks(不阻),desktop_op 走 DesktopMutex 串行 — 两者互不抢前台 - [ ] **E2E-14:** SafetyGuard:reply agent prompt 故意诱导发"删除文件"操作,desktop agent 调 weixin_send_text 后,safety guard 拦截 hotkey 'delete' → status=dangerous-action-blocked - [ ] **E2E-15:** SafetyGuard:故意写一个 fake tool 发 task appId='excel' → status=app-not-allowed - [ ] **E2E-16:** Loop 防护:**给 desktop agent 错误地配 `crew` toolset 试图保存** → 后端校验拦下;如果绕过校验直接改 DB,desktop agent 调 delegate_task → 应该被 tool 层面或 subagent 层面拦截(深度限制) - [ ] **E2E-17:** 用户在用电脑(鼠标移动到非微信窗口)→ 让位机制生效(后续可加),MVP 至少不 crash - [ ] **E2E-18:** ★ **MVP 单对话假设验证**(R1 风险): - 配置 channel 监听 2 个群 A 和 B - 让微信停留在群 A 对话页面 - 同时往 A 和 B 发消息 - 期望:A 收到回复 + B 收到 'precondition-failed' 错误,desktop_op_action_log 显示 B 任务 status=precondition-failed - 若 B 错误地收到了回复(说明跑去群 B 发了) → 是个 bug,需要 fix - [ ] **E2E-19:** ★ **reply agent 漏 delegate_task 检测**(R3 风险): - 配置 reply agent 的 prompt **故意不教它调 delegate_task** - 在群里发问题 - 期望:群里**无回复**,但 backend log 出现 `WARN reply agent did not call delegate_task (message dropped silently)` 行 - [ ] **E2E-20:** ★ **bizContext JSON-safe 校验**: - 写一个临时测试 inject 一个含 `function` 字段的 bizContext - 期望:抛 'biz-context-not-serializable',不影响正常流程 - [ ] **E2E-21:** ★ **desktop agent 自动 patch workerRoutingStrategy**: - 创建 desktop agent **不配置** `tools.perTool['weixin_send_text']` - 在 channel 编辑页保存(enabled=true) - 重新 list desktop agent,断言 `tools.perTool['weixin_send_text'].allowInSubagent === true` 且 `workerRoutingStrategy === 'force-main-process-proxy'` - [ ] **Step 1:** 逐条手工跑,关键场景留 screenshot / log - [ ] **Step 2:** 写报告,内容: - 环境(微信版本、模型版本、Node 版本、backend 版本) - Checklist 结果 - 已知问题 / followup - 平均单条耗时 + token / 成本(校验 spec §7.3 估算) - 成本对照:Seed 2.0 Pro 实测 / 估算 - [ ] **Step 3:** Commit ``` git commit -m "docs(desktop-op): E2E 验证报告" ``` --- ### Task 22: 老 weixin-uia spec 标 OBSOLETE **Files:** - Modify: `docs/superpowers/specs/2026-05-09-wechat-uia-channel-design.md` - [ ] **Step 1:** 文件顶部加(在 frontmatter 之后): ```markdown > **⚠️ OBSOLETE 2026-05-14**:UIA 路线在微信 4.1.9.54 经 PoC(`tools/uia_probe/probe.ps1`)验证彻底失效(Qt 自绘 + `MMUIRenderSubWindowHW` 硬件加速渲染层 → UIA 树只有 3 节点 0 交互控件;讲述人 / 注册表 AccessibilityTemp / `QT_ACCESSIBILITY=1` 环境变量 / `StructureChangedEventHandler` 伪客户端全部无效)。 > 新方案见 `2026-05-14-neta-desktop-op-design.md` v3(通用桌面 GUI Agent,WeixinAdapter 是第一个 application adapter)。 > 本文件保留作历史参考。 ``` - [ ] **Step 2:** Commit ``` git commit -m "docs(spec): weixin-uia spec 标 OBSOLETE(v3 desktop_op 取代)" ``` --- ## 自检 (Self-Review) ### 0. v4 双 Agent 架构覆盖(★ 新增) | Spec v4 章节 | 覆盖 Task | |---|---| | §0 v4 主要变更 H1-H8 | 整个 v4 plan | | §1.4 双 Agent 模型(职责分工 / 防 loop) | Task 15.5(toolset 校验) + Task 20(前端两个下拉) | | §2.1 微信场景接入(新链路图) | Task 8.5 + Task 14 + Task 15 | | §3.2 模块分层(weixin_send_text tool / 删除 helper) | Task 14 + 不创建 helper | | §3.3 ReAct 拓扑(adapter 主导) | Task 11(沿用) | | §3.7 DesktopOpService.runAndWait | Task 12(★ v4 修订) | | §4.1 desktop_op_config(★ 移除 default_model_channel_id) | Task 13.5(修订 entity 字段) | | §4.2 channel.config.weixinReply(加 desktopAgentId,删 modelChannelId) | Task 20(前端) + Task 15.5(校验) | | §4.2.1 reply / desktop agent toolset 校验 | Task 15.5 | | §4.2.2 bizContext 透传机制 | Task 8.5(扩 runtime_context)+ Task 15(agent_channel 注入)+ Task 14(tool 读取) | | §5.3 runAndWait 接口 | Task 12 | | §5.4 weixin_desktop toolset + weixin_send_text | Task 14 | | §5.5 已删除项(WeixinReplyHelper / replyToGroup / 自动发送块) | Task 15 | | §6.1 前端双 agent 下拉 | Task 20 | | §7.1 model 走 desktop agent.modelChannelId | Task 14(tool execute 注入 modelChannelId) | | §7.2.2 desktop agent 默认 prompt 模板 | Task 14(README / 文档建议)+ E2E-3 配置 | | §7.2.3 reply agent prompt 增量提示 | E2E-3 配置 | | §8.0 Phase 0.5 Subagent IPC PoC | Phase 0.5 Task 0.5.1 | | §8.4 E2E(双 agent 链路验证) | Phase H Task 21 全部 checklist | ### 1. Spec v3 覆盖 | Spec 章节 | 覆盖 Task | |---|---| | §0 v3 变更(G1-G9)| 整个 plan | | §1 背景 + UIA 失败 | Task 22 | | §2.1 senderQueue 解耦 + fire-and-forget | Task 12 (DesktopOpService enqueue) + Task 15 (replyToGroup) | | §2.2 用户感知 + watermark + model 下拉 | Task 14 / Task 20 | | §2.3 PoC 暴露的导航问题 | Task 9 (WeixinAdapter preFlightCheck 要求手动定位) | | §3.1 进程模型 | Task 1 (DPI) + 整体内嵌 | | §3.2 模块分层 desktop_op/ | Phase A-F 全在 desktop_op,Phase E 在 netaclaw | | §3.3 ReAct adapter 主导 | Task 11 | | §3.4 中文输入 clip.exe | Task 6c | | §3.5 全局 DesktopMutex | Task 5 | | §3.6 SafetyGuard | Task 5.5 | | §3.7 后台 worker + per-app queue | Task 12 | | §4.1 desktop_op_config | Task 13.5 | | §4.2 channel.config.weixinReply | Task 20 (前端) + Task 14 (服务 read) | | §4.3 desktop_op_action_log | Task 13 | | §5.1 文件清单 | Task 1-13 | | §5.2 接口 DesktopTask / ActionStep / AdapterContext | Task 2 / Task 8 / Task 9 / Task 11 | | §5.3 AppAdapter interface | Task 8 | | §5.4 DesktopOpService | Task 12 | | §5.5 WeixinReplyHelper | Task 14 | | §6 前端 | Task 20 | | §7.1 模型 model_channel | Task 10 (vlm_client) + Task 14 / Task 20 | | §7.2 Parser + Adapter prompt | Task 3 / Task 9 | | §7.3 成本估算 | Phase 0 PoC 校验 + Task 21 E2E 校验 | | §7.4 deps(不含 clipboardy)| Task 1 | | §7.5 DPI Aware | Task 1 | | §8.1 Phase 0 PoC ✅ | Task 0.1 / 0.2 | | §8.2 单元测试 + fixtures | Task 3 + 其他各 task | | §8.3 CI 政策 | plan 顶部声明 | | §8.4 E2E | Task 21 | ### 2. v3 review 9 项覆盖 | # | 问题 | 覆盖 Task | |---|---|---| | G1 | 模块迁出 netaclaw | 整个 Phase A-D 在 modules/desktop_op/ | | G2 | TaskContext 通用化 schema | Task 2 | | G3 | AppAdapter 注册式 | Task 8 + Task 9 | | G4 | agent_executor tool 注册 | (Layer 2,留口) | | G5 | 全局 desktop_op_config | Task 13.5 + Task 19 | | G6 | 改名 desktop_op | 整个 plan | | G7 | SafetyGuard | Task 5.5 + Task 11 (runtime 校验) + Task 21 E2E-14/15 | | G8 | 全局 DesktopMutex | Task 5 + Task 12 | | G9 | admin HTTP `/run-task` | (Layer 2,留口) | ### 3. v2 review 14 项覆盖(全部沿用) 详见 v2 plan 末尾自检,这里不重复(Phase 0 PoC / fire-and-forget / AbortSignal / fixtures / CI / 养号 / final_text / watermark / etc 全部在 v3 沿用 + 强化)。 ### 4. Placeholder 扫描 - 无 TBD - 每 Step 都有具体代码 / 命令 / 文件路径 ### 5. 类型一致性 - `DesktopTask` / `ActionStep` / `TaskResult` / `AdapterContext` 在 Task 2 / 8 / 9 / 11 / 12 / 13 / 14 一致 - 错误 token 一致: window-not-found / precondition-failed / app-not-allowed / dangerous-key-blocked / dangerous-action-blocked / model-failed / model-hallucinated / verify-failed / queue-overflow / aborted / weixin-reply-not-enabled / model-channel-not-configured / unsupported-platform / channel-not-found / channel-not-bound - `modelChannelId` 在 task / log entity / channel.config.weixinReply / desktop_op_config 一致 ### 6. 跨 Phase 衔接 - Phase 0 PoC ✅ → 输出 raw responses 给 Phase A Task 3 fixtures 用 - Phase A 工具 → Phase B Adapter 依赖 + Phase C VLM 依赖 + Phase D Runtime/Service 依赖 - Phase D runtime/service → Phase E 接入 (helper + replyToGroup + cascade) - Phase F 审计 → Phase H E2E-11 校验 - Phase G 前端 → Phase H E2E-3/4 入口 ### 7. DEV 可行性 - Phase 0 ✅ 已在 Windows 跑通 - Phase A-G 全部可在 Linux/Mac 跑单测(原生模块 platform=win32 才走;原生工具 task 6a/6b/6c 不写单测但有 e2e 兜底) - Phase H E2E 必须 Windows + 微信登录 + 测试群 + 测试小号 + 火山 API key ### 8. 时间线估算(参考,假设 1 个工程师全职) | Phase | 估时 | 备注 | |---|---|---| | Phase 0 PoC | ✅ 已完成 | | | Phase A(Task 1-7) | 4-6 天 | Task 5/6a/6b/6c 涉及原生模块,新手卡风险 | | Phase B(Task 8-9) | 1-2 天 | AppAdapter 接口 + WeixinAdapter | | Phase C(Task 10) | 1-2 天 | vlm_client | | Phase D(Task 11-13.5) | 2-3 天 | runtime + service + log entity + config entity | | Phase E(Task 14-17) | 1.5 天 | 微信 helper + replyToGroup + 2 cascade | | Phase F(Task 18-19) | 0.5 天 | 标准 Cool CRUD | | Phase G(Task 20) | 1 天 | 前端 | | Phase H(Task 21-22) | 1-2 天 | E2E 手工跑 + obsolete | | **总计** | **12-17 天** | | 加上测试小号养号(7 天并行)+ buffer ≈ **3 周交付 MVP**。 --- ## Execution Handoff Plan 完整保存在 `docs/superpowers/plans/2026-05-14-neta-desktop-op.md` v3。 **立项当天并行启动:** 1. ✅ Phase 0 PoC 已通过(2026-05-14 实跑,100% 1 次) 2. 运营启动测试小号养号(7 天)— 立即 3. 工程师可立即开 Phase A Task 1(装 deps + DPI) **进入 Phase A 前的最后 sanity check:** - Phase 0 PoC raw 报告里至少 20 条 VLM 输出已 commit(若没有,先做 Task 0.2)— Phase A Task 3 fixtures 需要它 --- ## ★ 与 weixin-archive sync 的边界(再强调) - **不动**:`weixin_archive_sync.ts`、`runtime/weixin_db/*` 所有监听/解密/WAL watcher 链路 - **不合并锁**:archive sync 仍用自己内部的 `channelLocks` Map(纯读 SQLite 操作,不与桌面键鼠抢) - **唯一交集**:`agent_channel.delete(ids)` 现在同时调 `archiveSyncService.deleteChannelArchive(id)`(已有,不动)+ `desktopOpService.abortByFilter(...)`(Task 16 新加) - archive sync ↔ desktop_op **物理隔离**:archive sync 在后台读文件,desktop_op 在前台动键鼠,互相不感知,互不影响。