# 文生图与图生图工具 实施计划 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** 为 Neta Agent 新增 `text_to_image` 和 `image_to_image` 两个工具,建立可扩展的图片生成 Provider 层,前端支持图片结果渲染。 **Architecture:** 后端新增 `image_providers/` 目录实现 Provider 策略模式(火山引擎 + MiniMax),两个工具通过 `tool_resolver` 注入凭证和 provider 实例。工具返回的图片 URL 经 `ImageStorageService` 转存本地后写入 session tree。前端 `message-item.vue` 升级图片渲染并新增多图网格分支。 **Tech Stack:** TypeScript, OpenAI SDK (火山引擎), fetch (MiniMax REST), TypeBox (参数 schema), Vue 3 + Element Plus (前端) **Spec:** `docs/superpowers/specs/2026-05-02-image-generation-tools-design.md` --- ## File Map | 文件 | 职责 | 操作 | |------|------|------| | `packages/backend/src/modules/netaclaw/image_providers/types.ts` | Provider 统一接口、凭证、错误类型、工厂函数 | 新增 | | `packages/backend/src/modules/netaclaw/image_providers/ark.ts` | 火山引擎 Provider | 新增 | | `packages/backend/src/modules/netaclaw/image_providers/minimax.ts` | MiniMax Provider | 新增 | | `packages/backend/src/modules/netaclaw/tools/common.ts` | 新增 `ImageItem`、`images` 类型、`imagesResult()`、`toolResultToText` 扩展 | 修改 | | `packages/backend/src/modules/netaclaw/tools/manifest.ts` | `ToolGovernanceExtra` 新增 `imageDefaults` / `imageConstraints` | 修改 | | `packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts` | 图片工具共享辅助函数(clampDimension、persistImages、formatImageToolResult) | 新增 | | `packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts` | 文生图工具 | 新增 | | `packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts` | 图生图工具 | 新增 | | `packages/backend/src/modules/netaclaw/tools/catalog.ts` | import 两个新工具触发注册 | 修改 | | `packages/backend/src/modules/netaclaw/service/image_storage.ts` | 图片转存本地服务 | 新增 | | `packages/backend/src/modules/netaclaw/service/tool_registry.ts` | 新增 `getToolByName` 方法 | 修改 | | `packages/backend/src/modules/netaclaw/service/tool_resolver.ts` | 注入凭证和 provider 实例 | 修改 | | `packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts` | 附件提示语扩展 | 修改 | | `packages/backend/src/modules/netaclaw/runtime/agent.ts` | 调用 `buildLLMMessages` 传 `toolNames` | 修改 | | `packages/frontend/src/modules/agent/tools/renderer-registry.ts` | rawResult 新增 `images` 类型 | 修改 | | `packages/frontend/src/modules/agent/components/message-item.vue` | 单图升级为 el-image + 多图网格 | 修改 | | `packages/frontend/src/modules/agent/views/tools.vue` | 图片生成配置区块 | 修改 | --- ### Task 1: 后端类型基础 — ToolResultContent 多图扩展 **Files:** - Modify: `packages/backend/src/modules/netaclaw/tools/common.ts` - [ ] **Step 1: 在 ToolResultContent 类型前新增 ImageItem 接口** 在 `common.ts` 的 `ToolResultContent` 类型定义之前(约第 26 行),新增: ```typescript export interface ImageItem { url: string; mimeType?: string; width?: number; height?: number; seed?: number; } ``` - [ ] **Step 2: 扩展 ToolResultContent 联合类型** 在现有 `type: 'image'` 分支之后,新增 `images` 分支: ```typescript | { type: 'images'; images: ImageItem[]; text?: string }; ``` 完整类型变为: ```typescript export type ToolResultContent = | { type: 'text'; text: string } | { type: 'json'; data: unknown } | { type: 'image'; url: string; mimeType?: string; text?: string; width?: number; height?: number; bytes?: number; originalWidth?: number; originalHeight?: number; originalBytes?: number; resized?: boolean; } | { type: 'images'; images: ImageItem[]; text?: string }; ``` - [ ] **Step 3: 新增 imagesResult 辅助函数** 在 `imageResult()` 函数之后新增: ```typescript export function imagesResult( images: ImageItem[], text?: string, ): ToolResultContent { return { type: 'images', images, text }; } ``` - [ ] **Step 4: 扩展 toolResultToText 函数** 在 `toolResultToText` 函数中,`if (value.type === 'image')` 分支之后,新增: ```typescript if (value.type === 'images') { const lines = (value as { type: 'images'; images: ImageItem[]; text?: string }).images.map((img, i) => `[图${i + 1}] ${img.url}${img.width && img.height ? ` (${img.width}x${img.height})` : ''}` ); const header = (value as any).text || `已生成 ${lines.length} 张图片`; return `${header}\n${lines.join('\n')}`; } ``` - [ ] **Step 5: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无新增错误 - [ ] **Step 6: 提交** ```bash git add packages/backend/src/modules/netaclaw/tools/common.ts git commit -m "feat(netaclaw): extend ToolResultContent with images type for multi-image tool results" ``` --- ### Task 2: ToolGovernanceExtra 扩展 — imageDefaults / imageConstraints **Files:** - Modify: `packages/backend/src/modules/netaclaw/tools/manifest.ts` - [ ] **Step 1: 扩展 ToolGovernanceExtra 类型** 在 `manifest.ts` 第 9-12 行的 `ToolGovernanceExtra` 类型中,新增图片工具字段: ```typescript export type ToolGovernanceExtra = { allowInSubagent?: boolean; workerRoutingStrategy?: ToolWorkerRoutingStrategy; imageDefaults?: { n?: number; aspectRatio?: string; width?: number; height?: number; watermark?: boolean; responseFormat?: 'url' | 'base64'; }; imageConstraints?: { maxN?: number; maxWidth?: number; maxHeight?: number; }; }; ``` - [ ] **Step 2: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无新增错误(现有代码只读取 `allowInSubagent` 和 `workerRoutingStrategy`,新增字段不影响) - [ ] **Step 3: 提交** ```bash git add packages/backend/src/modules/netaclaw/tools/manifest.ts git commit -m "feat(netaclaw): add imageDefaults and imageConstraints to ToolGovernanceExtra" ``` --- ### Task 3: Provider 层 — 统一接口与工厂 **Files:** - Create: `packages/backend/src/modules/netaclaw/image_providers/types.ts` - [ ] **Step 1: 创建 image_providers 目录和 types.ts** ```typescript import type { ToolGovernanceExtra } from '../tools/manifest.js'; export interface ImageProviderCredentials { baseUrl: string; apiKey: string; supplier: string; modelId: string; promptHint: string | null; extra?: ToolGovernanceExtra | null; } export interface TextToImageParams { prompt: string; width?: number; height?: number; aspectRatio?: string; n?: number; responseFormat?: 'url' | 'base64'; watermark?: boolean; seed?: number; extra?: Record; } export interface ImageToImageParams extends TextToImageParams { referenceImage: string; strength?: number; } export interface ImageGenerationResult { images: { url?: string; base64?: string; width?: number; height?: number }[]; model: string; provider: string; } export interface ImageGenerationProvider { readonly id: string; textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise; imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise; } export type ImageGenerationErrorCode = | 'content_safety' | 'rate_limit' | 'insufficient_balance' | 'invalid_params' | 'timeout' | 'network' | 'unknown'; export class ImageGenerationError extends Error { constructor( message: string, public readonly code: ImageGenerationErrorCode, public readonly retryable: boolean, ) { super(message); this.name = 'ImageGenerationError'; } } const providers = new Map(); export function registerImageProvider(provider: ImageGenerationProvider): void { providers.set(provider.id, provider); } export function getImageProvider(supplier: string, baseUrl: string): ImageGenerationProvider | null { const s = supplier.toLowerCase(); if (s === 'minimax') return providers.get('minimax') ?? null; if (s === 'ark' || s === 'volcengine') return providers.get('ark') ?? null; if (s === 'openai') { if (baseUrl.includes('volces.com') || baseUrl.includes('volcengine')) return providers.get('ark') ?? null; if (baseUrl.includes('minimax')) return providers.get('minimax') ?? null; } return null; } ``` - [ ] **Step 2: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 3: 提交** ```bash git add packages/backend/src/modules/netaclaw/image_providers/ git commit -m "feat(netaclaw): add image provider types, error class, and factory" ``` --- ### Task 4: Provider 层 — 火山引擎 (Ark) **Files:** - Create: `packages/backend/src/modules/netaclaw/image_providers/ark.ts` - [ ] **Step 1: 实现 ArkImageProvider** ```typescript import OpenAI from 'openai'; import { type ImageGenerationProvider, type ImageProviderCredentials, type TextToImageParams, type ImageToImageParams, type ImageGenerationResult, ImageGenerationError, registerImageProvider, } from './types.js'; function resolveSize(params: TextToImageParams): string | undefined { if (params.aspectRatio) { const map: Record = { '1:1': '1024x1024', '16:9': '1280x720', '4:3': '1152x864', '3:2': '1248x832', '2:3': '832x1248', '3:4': '864x1152', '9:16': '720x1280', }; return map[params.aspectRatio] ?? '1024x1024'; } if (params.width && params.height) { return `${params.width}x${params.height}`; } return undefined; } function normalizeResult(response: OpenAI.Images.ImagesResponse, creds: ImageProviderCredentials): ImageGenerationResult { return { images: (response.data ?? []).map(item => ({ url: item.url, base64: item.b64_json, })), model: creds.modelId, provider: 'ark', }; } class ArkImageProvider implements ImageGenerationProvider { readonly id = 'ark'; async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise { const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 }); try { const response = await client.images.generate({ model: creds.modelId, prompt: params.prompt, size: resolveSize(params) as any, n: params.n ?? 1, response_format: params.responseFormat ?? 'url', ...(params.extra || params.watermark !== undefined ? { extra_body: { ...(params.watermark !== undefined ? { watermark: params.watermark } : {}), ...params.extra, }, } : {}), }); return normalizeResult(response, creds); } catch (err: any) { throw this.wrapError(err); } } async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise { const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 }); try { const response = await client.images.generate({ model: creds.modelId, prompt: params.prompt, size: resolveSize(params) as any, n: params.n ?? 1, response_format: params.responseFormat ?? 'url', extra_body: { image: params.referenceImage, ...(params.strength !== undefined ? { strength: params.strength } : {}), ...(params.watermark !== undefined ? { watermark: params.watermark } : {}), ...params.extra, }, } as any); return normalizeResult(response, creds); } catch (err: any) { throw this.wrapError(err); } } private wrapError(err: any): ImageGenerationError { const status = err?.status ?? err?.response?.status; const msg = err?.message ?? String(err); if (status === 429) return new ImageGenerationError(msg, 'rate_limit', true); if (status === 402) return new ImageGenerationError(msg, 'insufficient_balance', false); if (status === 400) { if (msg.includes('safety') || msg.includes('sensitive') || msg.includes('安全')) return new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false); return new ImageGenerationError(msg, 'invalid_params', false); } if (err?.code === 'ETIMEDOUT' || err?.code === 'ECONNABORTED') return new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true); return new ImageGenerationError(msg, 'unknown', false); } } registerImageProvider(new ArkImageProvider()); ``` - [ ] **Step 2: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 3: 提交** ```bash git add packages/backend/src/modules/netaclaw/image_providers/ark.ts git commit -m "feat(netaclaw): add Ark (Volcano Engine) image provider via OpenAI SDK" ``` --- ### Task 5: Provider 层 — MiniMax **Files:** - Create: `packages/backend/src/modules/netaclaw/image_providers/minimax.ts` - [ ] **Step 1: 实现 MiniMaxImageProvider** ```typescript import { type ImageGenerationProvider, type ImageProviderCredentials, type TextToImageParams, type ImageToImageParams, type ImageGenerationResult, ImageGenerationError, registerImageProvider, } from './types.js'; interface MiniMaxResponse { id?: string; data?: { image_urls?: string[]; image_base64?: string[] }; metadata?: { success_count?: number; failed_count?: number }; base_resp?: { status_code?: number; status_msg?: string }; } function buildBaseBody(params: TextToImageParams, creds: ImageProviderCredentials): Record { const body: Record = { model: creds.modelId, prompt: params.prompt, n: params.n ?? 1, response_format: params.responseFormat ?? 'url', }; if (params.watermark !== undefined) body.aigc_watermark = params.watermark; if (params.aspectRatio) { body.aspect_ratio = params.aspectRatio; } else if (params.width && params.height) { body.width = params.width; body.height = params.height; } if (params.seed !== undefined) body.seed = params.seed; if (params.extra?.style) body.style = params.extra.style; if (params.extra?.prompt_optimizer !== undefined) body.prompt_optimizer = params.extra.prompt_optimizer; return body; } function normalizeResponse(json: MiniMaxResponse, creds: ImageProviderCredentials, format: string): ImageGenerationResult { const resp = json.base_resp; if (resp && resp.status_code !== 0) { const code = resp.status_code; if (code === 1026) throw new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false); if (code === 1002) throw new ImageGenerationError('当前请求过多,请稍后重试', 'rate_limit', true); if (code === 1008) throw new ImageGenerationError('模型渠道余额不足', 'insufficient_balance', false); if (code === 1004) throw new ImageGenerationError('API Key 鉴权失败', 'invalid_params', false); throw new ImageGenerationError(resp.status_msg ?? `MiniMax error ${code}`, 'unknown', false); } const urls = json.data?.image_urls ?? []; const b64s = json.data?.image_base64 ?? []; const images = format === 'base64' ? b64s.map(b => ({ base64: b })) : urls.map(u => ({ url: u })); return { images, model: creds.modelId, provider: 'minimax' }; } class MiniMaxImageProvider implements ImageGenerationProvider { readonly id = 'minimax'; async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise { const body = buildBaseBody(params, creds); return this.request(body, creds, params.responseFormat ?? 'url'); } async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise { const body = buildBaseBody(params, creds); body.subject_reference = [{ image: params.referenceImage }]; if (params.strength !== undefined) body.strength = params.strength; return this.request(body, creds, params.responseFormat ?? 'url'); } private async request(body: Record, creds: ImageProviderCredentials, format: string): Promise { const controller = new AbortController(); const timer = setTimeout(() => controller.abort(), 60_000); try { const baseUrl = creds.baseUrl.replace(/\/+$/, ''); const res = await fetch(`${baseUrl}/v1/image_generation`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${creds.apiKey}`, }, body: JSON.stringify(body), signal: controller.signal, }); if (!res.ok) { const text = await res.text().catch(() => ''); throw new ImageGenerationError(`MiniMax HTTP ${res.status}: ${text}`, res.status === 429 ? 'rate_limit' : 'unknown', res.status === 429); } const json: MiniMaxResponse = await res.json(); return normalizeResponse(json, creds, format); } catch (err: any) { if (err instanceof ImageGenerationError) throw err; if (err?.name === 'AbortError') throw new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true); throw new ImageGenerationError(err?.message ?? String(err), 'network', true); } finally { clearTimeout(timer); } } } registerImageProvider(new MiniMaxImageProvider()); ``` - [ ] **Step 2: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 3: 提交** ```bash git add packages/backend/src/modules/netaclaw/image_providers/minimax.ts git commit -m "feat(netaclaw): add MiniMax image provider via REST API" ``` --- ### Task 6: 图片转存服务 **Files:** - Create: `packages/backend/src/modules/netaclaw/service/image_storage.ts` - [ ] **Step 1: 实现 ImageStorageService** 复用现有 `pluginService.getInstance('upload')` 的 `downAndUpload` 方法: ```typescript import { Inject, Provide, Scope, ScopeEnum } from '@midwayjs/core'; import { PluginService } from '../../plugin/service/info.js'; import { randomUUID } from 'crypto'; @Provide() @Scope(ScopeEnum.Singleton) export class ImageStorageService { @Inject() pluginService: PluginService; async persist(tempUrl: string): Promise { const upload = await this.pluginService.getInstance('upload'); const ext = this.detectExtension(tempUrl); const filename = `img-${Date.now()}-${randomUUID().slice(0, 8)}${ext}`; return upload.downAndUpload(tempUrl, filename); } async persistAll(urls: string[]): Promise { return Promise.all(urls.map(url => this.persist(url))); } private detectExtension(url: string): string { const pathname = url.split('?')[0]; const match = pathname.match(/\.(png|jpg|jpeg|webp|gif)$/i); return match ? `.${match[1].toLowerCase()}` : '.png'; } } ``` - [ ] **Step 2: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 3: 提交** ```bash git add packages/backend/src/modules/netaclaw/service/image_storage.ts git commit -m "feat(netaclaw): add ImageStorageService for persisting generated images" ``` --- ### Task 7: 图片工具共享函数 **Files:** - Create: `packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts` - [ ] **Step 1: 创建 image_common.ts** 提取 text_to_image 和 image_to_image 共用的辅助函数: ```typescript import { imageResult, imagesResult } from '../common.js'; import type { ImageGenerationResult } from '../../image_providers/types.js'; import type { ImageStorageService } from '../../service/image_storage.js'; export function clampDimension(value: number | undefined, max: number): number | undefined { if (value === undefined) return undefined; return Math.min(value, max); } export async function persistImages(result: ImageGenerationResult, storage: ImageStorageService): Promise { const persisted = await Promise.all( result.images.map(async img => { if (!img.url) return img; const permanentUrl = await storage.persist(img.url); return { ...img, url: permanentUrl }; }) ); return { ...result, images: persisted }; } export function formatImageToolResult(result: ImageGenerationResult) { if (result.images.length === 1) { const img = result.images[0]; return imageResult(img.url!, undefined, { width: img.width, height: img.height, text: `图片已生成 (${result.provider}/${result.model})`, }); } return imagesResult( result.images.map(img => ({ url: img.url!, width: img.width, height: img.height })), `已生成 ${result.images.length} 张图片 (${result.provider}/${result.model})`, ); } ``` - [ ] **Step 2: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 3: 提交** ```bash git add packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts git commit -m "feat(netaclaw): extract shared image tool helpers" ``` --- ### Task 8: text_to_image 工具 **Files:** - Create: `packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts` - Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts` - [ ] **Step 1: 创建 text_to_image.ts** ```typescript import { Type, Static } from '@sinclair/typebox'; import { type AnyAgentTool } from '../common.js'; import { registerSchema } from '../catalog.js'; import type { ImageGenerationProvider, ImageProviderCredentials, } from '../../image_providers/types.js'; import { ImageGenerationError } from '../../image_providers/types.js'; import type { ImageStorageService } from '../../service/image_storage.js'; import { clampDimension, persistImages, formatImageToolResult } from './image_common.js'; const Params = Type.Object({ prompt: Type.String({ description: '图片描述,尽量详细具体' }), aspectRatio: Type.Optional(Type.String({ description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16', })), width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })), height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })), n: Type.Optional(Type.Integer({ description: '生成数量,默认 1,最大 9', minimum: 1, maximum: 9 })), watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })), seed: Type.Optional(Type.Integer({ description: '随机种子,相同 seed 可复现相近结果' })), extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), { description: 'Provider 特有参数,如 MiniMax 的 style、prompt_optimizer', })), }); export function createTextToImageTool( creds: ImageProviderCredentials, provider: ImageGenerationProvider, storage: ImageStorageService, ): AnyAgentTool { const defaults = creds.extra?.imageDefaults ?? {}; const constraints = creds.extra?.imageConstraints ?? {}; return { name: 'text_to_image', label: '文生图', description: creds.promptHint ? `根据文字描述生成图片。\n${creds.promptHint}` : '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。', parameters: Params, async execute(_id: string, params: Static) { const merged = { prompt: params.prompt, n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9), aspectRatio: params.aspectRatio ?? defaults.aspectRatio, width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048), height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048), watermark: params.watermark ?? defaults.watermark ?? false, seed: params.seed, responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64', extra: params.extra, }; try { let result = await provider.textToImage(merged, creds); result = await persistImages(result, storage); return formatImageToolResult(result); } catch (err) { if (err instanceof ImageGenerationError) { const prefix = err.retryable ? '[可重试] ' : ''; return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` }; } return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` }; } }, }; } registerSchema({ name: 'text_to_image', toolset: 'vision', description: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。', capability: 'multimodal', visibility: 'tool', isCore: false, canDisable: true, supportsPromptHint: true, requiresModel: true, }); ``` - [ ] **Step 2: 在 catalog.ts 注册** 在 `catalog.ts` 末尾的 import 列表中(约第 66 行 `import './builtin/execute_skill.js';` 之后),新增: ```typescript import './builtin/text_to_image.js'; ``` - [ ] **Step 3: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 4: 提交** ```bash git add packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts git commit -m "feat(netaclaw): add text_to_image tool with defaults/constraints merge and image persistence" ``` --- ### Task 9: image_to_image 工具 **Files:** - Create: `packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts` - Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts` - [ ] **Step 1: 创建 image_to_image.ts** ```typescript import { Type, Static } from '@sinclair/typebox'; import { type AnyAgentTool } from '../common.js'; import { registerSchema } from '../catalog.js'; import type { ImageGenerationProvider, ImageProviderCredentials, } from '../../image_providers/types.js'; import { ImageGenerationError } from '../../image_providers/types.js'; import type { ImageStorageService } from '../../service/image_storage.js'; import { clampDimension, persistImages, formatImageToolResult } from './image_common.js'; const Params = Type.Object({ prompt: Type.String({ description: '对参考图的修改描述' }), referenceImage: Type.String({ description: '参考图片 URL(从用户上传附件获取)' }), strength: Type.Optional(Type.Number({ description: '参考图影响强度 0-1,越大越接近原图', minimum: 0, maximum: 1, })), aspectRatio: Type.Optional(Type.String({ description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16', })), width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })), height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })), n: Type.Optional(Type.Integer({ description: '生成数量,默认 1,最大 9', minimum: 1, maximum: 9 })), watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })), seed: Type.Optional(Type.Integer({ description: '随机种子' })), extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), { description: 'Provider 特有参数', })), }); export function createImageToImageTool( creds: ImageProviderCredentials, provider: ImageGenerationProvider, storage: ImageStorageService, ): AnyAgentTool { const defaults = creds.extra?.imageDefaults ?? {}; const constraints = creds.extra?.imageConstraints ?? {}; return { name: 'image_to_image', label: '图生图', description: creds.promptHint ? `基于参考图片生成新图片。\n${creds.promptHint}` : '基于参考图片生成新图片,支持风格迁移、内容编辑等。传入参考图URL和修改描述。', parameters: Params, async execute(_id: string, params: Static) { const merged = { prompt: params.prompt, referenceImage: params.referenceImage, strength: params.strength, n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9), aspectRatio: params.aspectRatio ?? defaults.aspectRatio, width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048), height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048), watermark: params.watermark ?? defaults.watermark ?? false, seed: params.seed, responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64', extra: params.extra, }; try { let result = await provider.imageToImage(merged, creds); result = await persistImages(result, storage); return formatImageToolResult(result); } catch (err) { if (err instanceof ImageGenerationError) { const prefix = err.retryable ? '[可重试] ' : ''; return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` }; } return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` }; } }, }; } registerSchema({ name: 'image_to_image', toolset: 'vision', description: '基于参考图片生成新图片,支持风格迁移、内容编辑等。传入参考图URL和修改描述。', capability: 'multimodal', visibility: 'tool', isCore: false, canDisable: true, supportsPromptHint: true, requiresModel: true, }); ``` - [ ] **Step 2: 在 catalog.ts 注册** 在 `catalog.ts` 末尾 `import './builtin/text_to_image.js';` 之后新增: ```typescript import './builtin/image_to_image.js'; ``` - [ ] **Step 3: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 4: 提交** ```bash git add packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts git commit -m "feat(netaclaw): add image_to_image tool with reference image support" ``` --- ### Task 10: tool_resolver 集成 **Files:** - Modify: `packages/backend/src/modules/netaclaw/service/tool_resolver.ts` - Modify: `packages/backend/src/modules/netaclaw/service/tool_registry.ts` - [ ] **Step 1: 在 tool_registry.ts 新增 getToolByName 方法** 在 `NetaClawToolRegistryService` 类中(`getToolModelConfig` 方法附近),新增: ```typescript async getToolByName(name: string): Promise { return this.toolRepo.findOneBy({ name }); } ``` - [ ] **Step 2: 在 tool_resolver.ts 添加 import** 在 `tool_resolver.ts` 顶部 import 区域(约第 32 行 `import { createImageRecognizeTool }` 附近),新增: ```typescript import { createTextToImageTool } from '../tools/builtin/text_to_image.js'; import { createImageToImageTool } from '../tools/builtin/image_to_image.js'; import { getImageProvider, type ImageProviderCredentials } from '../image_providers/types.js'; import '../image_providers/ark.js'; import '../image_providers/minimax.js'; import { ImageStorageService } from './image_storage.js'; ``` - [ ] **Step 3: 注入 ImageStorageService** 在 `NetaClawToolResolverService` 类中,已有的 `@Inject()` 区域新增: ```typescript @Inject() imageStorageService: ImageStorageService; ``` - [ ] **Step 4: 在 resolve 方法中注入两个工具** 在 `tool_resolver.ts` 的 `resolve` 方法中,找到 `image_recognize` 的注入块(约第 647-664 行),在其 `}` 之后新增: ```typescript for (const imgToolName of ['text_to_image', 'image_to_image'] as const) { if (filteredNames.includes(imgToolName)) { const toolModelConfig = await this.toolRegistry.getToolModelConfig(imgToolName); if (toolModelConfig) { const channelCreds = await this.modelChannelService.resolveForAgent(toolModelConfig.modelChannelId, toolModelConfig.modelId); if (channelCreds) { const provider = getImageProvider(channelCreds.channelSupplier, channelCreds.baseUrl ?? ''); if (provider) { const toolEntity = await this.toolRegistry.getToolByName(imgToolName); const extra = toolEntity?.extra as import('../tools/manifest.js').ToolGovernanceExtra | null; const creds: ImageProviderCredentials = { baseUrl: channelCreds.baseUrl ?? '', apiKey: channelCreds.apiKey, supplier: channelCreds.channelSupplier, modelId: toolModelConfig.modelId, promptHint: toolModelConfig.promptHint, extra, }; if (imgToolName === 'text_to_image') { runtimeTools.push(createTextToImageTool(creds, provider, this.imageStorageService)); } else { runtimeTools.push(createImageToImageTool(creds, provider, this.imageStorageService)); } } else { disabledReasons.push({ name: imgToolName, reason: 'image_provider_not_found' }); } } else { disabledReasons.push({ name: imgToolName, reason: 'model_channel_unavailable' }); } } else { disabledReasons.push({ name: imgToolName, reason: 'model_not_configured' }); } } } ``` - [ ] **Step 5: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 6: 提交** ```bash git add packages/backend/src/modules/netaclaw/service/tool_resolver.ts packages/backend/src/modules/netaclaw/service/tool_registry.ts git commit -m "feat(netaclaw): integrate text_to_image and image_to_image into tool_resolver" ``` --- ### Task 11: Prompt Builder 附件提示语扩展 **Files:** - Modify: `packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts` - Modify: `packages/backend/src/modules/netaclaw/runtime/agent.ts` - [ ] **Step 1: 修改 buildLLMMessages 签名** 在 `prompt_builder.ts` 第 141 行,给 `buildLLMMessages` 新增 `toolNames` 参数: ```typescript export function buildLLMMessages( systemPrompt: string, history: LLMMessage[], userMessage: UserMessageInput, toolNames?: string[], ): LLMMessage[] { ``` - [ ] **Step 2: 修改附件提示语生成逻辑** 替换第 152-162 行的附件处理块: ```typescript if (userMessage.metadata?.attachments && (userMessage.metadata.attachments as unknown[]).length) { const attachments = userMessage.metadata.attachments as ChatAttachment[]; const desc = attachments.map(a => { const typeLabel = ({ image: '图片', video: '视频', pdf: 'PDF', document: '文件', other: '文件' } as Record)[a.type]; return `- ${typeLabel}: ${a.name} (URL: ${a.url})`; }).join('\n'); const hints: string[] = []; const names = new Set(toolNames ?? []); if (names.has('image_recognize')) { hints.push('如需分析图片内容,请使用 image_recognize 工具,传入图片 URL'); } if (names.has('image_to_image')) { hints.push('如需基于图片生成新图片,请使用 image_to_image 工具,将图片 URL 作为 referenceImage 参数'); } if (hints.length === 0) { hints.push('附件已上传,可在需要时引用其 URL'); } messages.push({ role: 'user', content: `[系统提示] 用户上传了以下附件:\n${desc}\n${hints.join('。')}。`, }); } ``` - [ ] **Step 3: 修改 agent.ts 调用点** 在 `agent.ts` 第 96-100 行,给 `buildLLMMessages` 传入 `toolNames`: ```typescript const messages: LLMMessage[] = buildLLMMessages( agentConfig.systemPrompt, history, { content: userMessage, metadata: params.userMessageMetadata }, params.toolNames || tools.map(tool => tool.name), ); ``` - [ ] **Step 4: 验证编译** Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20` Expected: 无错误 - [ ] **Step 5: 提交** ```bash git add packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts packages/backend/src/modules/netaclaw/runtime/agent.ts git commit -m "feat(netaclaw): dynamic attachment hints based on available tools" ``` --- ### Task 12: 前端 — renderer-registry 和 message-item 图片渲染 **Files:** - Modify: `packages/frontend/src/modules/agent/tools/renderer-registry.ts` - Modify: `packages/frontend/src/modules/agent/components/message-item.vue` - [ ] **Step 1: 扩展 renderer-registry 的 rawResult 类型** 在 `renderer-registry.ts` 第 9 行 `ToolRenderSource` 接口的 `rawResult` 字段中,扩展 `type` 联合类型和新增 `images` 字段: ```typescript rawResult?: { type: 'text' | 'json' | 'image' | 'images'; text?: string; data?: unknown; url?: string; mimeType?: string; width?: number; height?: number; bytes?: number; originalWidth?: number; originalHeight?: number; originalBytes?: number; resized?: boolean; images?: { url: string; mimeType?: string; width?: number; height?: number; seed?: number; }[]; }; ``` - [ ] **Step 2: 修改 message-item.vue 模板 — 升级单图渲染并新增多图** 找到 `message-item.vue` 第 75-83 行现有的单图渲染块: ```vue
{{ formatToolImageCaption(tool.rawResult) }}
``` 替换为: ```vue
{{ formatToolImageCaption(tool.rawResult) }}
{{ tool.rawResult.text }}
``` - [ ] **Step 3: 新增多图网格样式** 在 `message-item.vue` 的 `