GPU_GUARD_MONOREPO/docs/superpowers/plans/2026-05-02-image-generation-tools.md
2026-05-20 21:39:12 +08:00

1347 lines
46 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 文生图与图生图工具 实施计划
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** 为 Neta Agent 新增 `text_to_image``image_to_image` 两个工具,建立可扩展的图片生成 Provider 层,前端支持图片结果渲染。
**Architecture:** 后端新增 `image_providers/` 目录实现 Provider 策略模式(火山引擎 + MiniMax两个工具通过 `tool_resolver` 注入凭证和 provider 实例。工具返回的图片 URL 经 `ImageStorageService` 转存本地后写入 session tree。前端 `message-item.vue` 升级图片渲染并新增多图网格分支。
**Tech Stack:** TypeScript, OpenAI SDK (火山引擎), fetch (MiniMax REST), TypeBox (参数 schema), Vue 3 + Element Plus (前端)
**Spec:** `docs/superpowers/specs/2026-05-02-image-generation-tools-design.md`
---
## File Map
| 文件 | 职责 | 操作 |
|------|------|------|
| `packages/backend/src/modules/netaclaw/image_providers/types.ts` | Provider 统一接口、凭证、错误类型、工厂函数 | 新增 |
| `packages/backend/src/modules/netaclaw/image_providers/ark.ts` | 火山引擎 Provider | 新增 |
| `packages/backend/src/modules/netaclaw/image_providers/minimax.ts` | MiniMax Provider | 新增 |
| `packages/backend/src/modules/netaclaw/tools/common.ts` | 新增 `ImageItem``images` 类型、`imagesResult()``toolResultToText` 扩展 | 修改 |
| `packages/backend/src/modules/netaclaw/tools/manifest.ts` | `ToolGovernanceExtra` 新增 `imageDefaults` / `imageConstraints` | 修改 |
| `packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts` | 图片工具共享辅助函数clampDimension、persistImages、formatImageToolResult | 新增 |
| `packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts` | 文生图工具 | 新增 |
| `packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts` | 图生图工具 | 新增 |
| `packages/backend/src/modules/netaclaw/tools/catalog.ts` | import 两个新工具触发注册 | 修改 |
| `packages/backend/src/modules/netaclaw/service/image_storage.ts` | 图片转存本地服务 | 新增 |
| `packages/backend/src/modules/netaclaw/service/tool_registry.ts` | 新增 `getToolByName` 方法 | 修改 |
| `packages/backend/src/modules/netaclaw/service/tool_resolver.ts` | 注入凭证和 provider 实例 | 修改 |
| `packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts` | 附件提示语扩展 | 修改 |
| `packages/backend/src/modules/netaclaw/runtime/agent.ts` | 调用 `buildLLMMessages``toolNames` | 修改 |
| `packages/frontend/src/modules/agent/tools/renderer-registry.ts` | rawResult 新增 `images` 类型 | 修改 |
| `packages/frontend/src/modules/agent/components/message-item.vue` | 单图升级为 el-image + 多图网格 | 修改 |
| `packages/frontend/src/modules/agent/views/tools.vue` | 图片生成配置区块 | 修改 |
---
### Task 1: 后端类型基础 — ToolResultContent 多图扩展
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/tools/common.ts`
- [ ] **Step 1: 在 ToolResultContent 类型前新增 ImageItem 接口**
`common.ts``ToolResultContent` 类型定义之前(约第 26 行),新增:
```typescript
export interface ImageItem {
url: string;
mimeType?: string;
width?: number;
height?: number;
seed?: number;
}
```
- [ ] **Step 2: 扩展 ToolResultContent 联合类型**
在现有 `type: 'image'` 分支之后,新增 `images` 分支:
```typescript
| { type: 'images'; images: ImageItem[]; text?: string };
```
完整类型变为:
```typescript
export type ToolResultContent =
| { type: 'text'; text: string }
| { type: 'json'; data: unknown }
| {
type: 'image';
url: string;
mimeType?: string;
text?: string;
width?: number;
height?: number;
bytes?: number;
originalWidth?: number;
originalHeight?: number;
originalBytes?: number;
resized?: boolean;
}
| { type: 'images'; images: ImageItem[]; text?: string };
```
- [ ] **Step 3: 新增 imagesResult 辅助函数**
`imageResult()` 函数之后新增:
```typescript
export function imagesResult(
images: ImageItem[],
text?: string,
): ToolResultContent {
return { type: 'images', images, text };
}
```
- [ ] **Step 4: 扩展 toolResultToText 函数**
`toolResultToText` 函数中,`if (value.type === 'image')` 分支之后,新增:
```typescript
if (value.type === 'images') {
const lines = (value as { type: 'images'; images: ImageItem[]; text?: string }).images.map((img, i) =>
`[图${i + 1}] ${img.url}${img.width && img.height ? ` (${img.width}x${img.height})` : ''}`
);
const header = (value as any).text || `已生成 ${lines.length} 张图片`;
return `${header}\n${lines.join('\n')}`;
}
```
- [ ] **Step 5: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无新增错误
- [ ] **Step 6: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/common.ts
git commit -m "feat(netaclaw): extend ToolResultContent with images type for multi-image tool results"
```
---
### Task 2: ToolGovernanceExtra 扩展 — imageDefaults / imageConstraints
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/tools/manifest.ts`
- [ ] **Step 1: 扩展 ToolGovernanceExtra 类型**
`manifest.ts` 第 9-12 行的 `ToolGovernanceExtra` 类型中,新增图片工具字段:
```typescript
export type ToolGovernanceExtra = {
allowInSubagent?: boolean;
workerRoutingStrategy?: ToolWorkerRoutingStrategy;
imageDefaults?: {
n?: number;
aspectRatio?: string;
width?: number;
height?: number;
watermark?: boolean;
responseFormat?: 'url' | 'base64';
};
imageConstraints?: {
maxN?: number;
maxWidth?: number;
maxHeight?: number;
};
};
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无新增错误(现有代码只读取 `allowInSubagent``workerRoutingStrategy`,新增字段不影响)
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/manifest.ts
git commit -m "feat(netaclaw): add imageDefaults and imageConstraints to ToolGovernanceExtra"
```
---
### Task 3: Provider 层 — 统一接口与工厂
**Files:**
- Create: `packages/backend/src/modules/netaclaw/image_providers/types.ts`
- [ ] **Step 1: 创建 image_providers 目录和 types.ts**
```typescript
import type { ToolGovernanceExtra } from '../tools/manifest.js';
export interface ImageProviderCredentials {
baseUrl: string;
apiKey: string;
supplier: string;
modelId: string;
promptHint: string | null;
extra?: ToolGovernanceExtra | null;
}
export interface TextToImageParams {
prompt: string;
width?: number;
height?: number;
aspectRatio?: string;
n?: number;
responseFormat?: 'url' | 'base64';
watermark?: boolean;
seed?: number;
extra?: Record<string, unknown>;
}
export interface ImageToImageParams extends TextToImageParams {
referenceImage: string;
strength?: number;
}
export interface ImageGenerationResult {
images: { url?: string; base64?: string; width?: number; height?: number }[];
model: string;
provider: string;
}
export interface ImageGenerationProvider {
readonly id: string;
textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult>;
imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult>;
}
export type ImageGenerationErrorCode =
| 'content_safety'
| 'rate_limit'
| 'insufficient_balance'
| 'invalid_params'
| 'timeout'
| 'network'
| 'unknown';
export class ImageGenerationError extends Error {
constructor(
message: string,
public readonly code: ImageGenerationErrorCode,
public readonly retryable: boolean,
) {
super(message);
this.name = 'ImageGenerationError';
}
}
const providers = new Map<string, ImageGenerationProvider>();
export function registerImageProvider(provider: ImageGenerationProvider): void {
providers.set(provider.id, provider);
}
export function getImageProvider(supplier: string, baseUrl: string): ImageGenerationProvider | null {
const s = supplier.toLowerCase();
if (s === 'minimax') return providers.get('minimax') ?? null;
if (s === 'ark' || s === 'volcengine') return providers.get('ark') ?? null;
if (s === 'openai') {
if (baseUrl.includes('volces.com') || baseUrl.includes('volcengine')) return providers.get('ark') ?? null;
if (baseUrl.includes('minimax')) return providers.get('minimax') ?? null;
}
return null;
}
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/image_providers/
git commit -m "feat(netaclaw): add image provider types, error class, and factory"
```
---
### Task 4: Provider 层 — 火山引擎 (Ark)
**Files:**
- Create: `packages/backend/src/modules/netaclaw/image_providers/ark.ts`
- [ ] **Step 1: 实现 ArkImageProvider**
```typescript
import OpenAI from 'openai';
import {
type ImageGenerationProvider,
type ImageProviderCredentials,
type TextToImageParams,
type ImageToImageParams,
type ImageGenerationResult,
ImageGenerationError,
registerImageProvider,
} from './types.js';
function resolveSize(params: TextToImageParams): string | undefined {
if (params.aspectRatio) {
const map: Record<string, string> = {
'1:1': '1024x1024',
'16:9': '1280x720',
'4:3': '1152x864',
'3:2': '1248x832',
'2:3': '832x1248',
'3:4': '864x1152',
'9:16': '720x1280',
};
return map[params.aspectRatio] ?? '1024x1024';
}
if (params.width && params.height) {
return `${params.width}x${params.height}`;
}
return undefined;
}
function normalizeResult(response: OpenAI.Images.ImagesResponse, creds: ImageProviderCredentials): ImageGenerationResult {
return {
images: (response.data ?? []).map(item => ({
url: item.url,
base64: item.b64_json,
})),
model: creds.modelId,
provider: 'ark',
};
}
class ArkImageProvider implements ImageGenerationProvider {
readonly id = 'ark';
async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 });
try {
const response = await client.images.generate({
model: creds.modelId,
prompt: params.prompt,
size: resolveSize(params) as any,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
...(params.extra || params.watermark !== undefined
? {
extra_body: {
...(params.watermark !== undefined ? { watermark: params.watermark } : {}),
...params.extra,
},
}
: {}),
});
return normalizeResult(response, creds);
} catch (err: any) {
throw this.wrapError(err);
}
}
async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 });
try {
const response = await client.images.generate({
model: creds.modelId,
prompt: params.prompt,
size: resolveSize(params) as any,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
extra_body: {
image: params.referenceImage,
...(params.strength !== undefined ? { strength: params.strength } : {}),
...(params.watermark !== undefined ? { watermark: params.watermark } : {}),
...params.extra,
},
} as any);
return normalizeResult(response, creds);
} catch (err: any) {
throw this.wrapError(err);
}
}
private wrapError(err: any): ImageGenerationError {
const status = err?.status ?? err?.response?.status;
const msg = err?.message ?? String(err);
if (status === 429) return new ImageGenerationError(msg, 'rate_limit', true);
if (status === 402) return new ImageGenerationError(msg, 'insufficient_balance', false);
if (status === 400) {
if (msg.includes('safety') || msg.includes('sensitive') || msg.includes('安全'))
return new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false);
return new ImageGenerationError(msg, 'invalid_params', false);
}
if (err?.code === 'ETIMEDOUT' || err?.code === 'ECONNABORTED')
return new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true);
return new ImageGenerationError(msg, 'unknown', false);
}
}
registerImageProvider(new ArkImageProvider());
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/image_providers/ark.ts
git commit -m "feat(netaclaw): add Ark (Volcano Engine) image provider via OpenAI SDK"
```
---
### Task 5: Provider 层 — MiniMax
**Files:**
- Create: `packages/backend/src/modules/netaclaw/image_providers/minimax.ts`
- [ ] **Step 1: 实现 MiniMaxImageProvider**
```typescript
import {
type ImageGenerationProvider,
type ImageProviderCredentials,
type TextToImageParams,
type ImageToImageParams,
type ImageGenerationResult,
ImageGenerationError,
registerImageProvider,
} from './types.js';
interface MiniMaxResponse {
id?: string;
data?: { image_urls?: string[]; image_base64?: string[] };
metadata?: { success_count?: number; failed_count?: number };
base_resp?: { status_code?: number; status_msg?: string };
}
function buildBaseBody(params: TextToImageParams, creds: ImageProviderCredentials): Record<string, unknown> {
const body: Record<string, unknown> = {
model: creds.modelId,
prompt: params.prompt,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
};
if (params.watermark !== undefined) body.aigc_watermark = params.watermark;
if (params.aspectRatio) {
body.aspect_ratio = params.aspectRatio;
} else if (params.width && params.height) {
body.width = params.width;
body.height = params.height;
}
if (params.seed !== undefined) body.seed = params.seed;
if (params.extra?.style) body.style = params.extra.style;
if (params.extra?.prompt_optimizer !== undefined) body.prompt_optimizer = params.extra.prompt_optimizer;
return body;
}
function normalizeResponse(json: MiniMaxResponse, creds: ImageProviderCredentials, format: string): ImageGenerationResult {
const resp = json.base_resp;
if (resp && resp.status_code !== 0) {
const code = resp.status_code;
if (code === 1026) throw new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false);
if (code === 1002) throw new ImageGenerationError('当前请求过多,请稍后重试', 'rate_limit', true);
if (code === 1008) throw new ImageGenerationError('模型渠道余额不足', 'insufficient_balance', false);
if (code === 1004) throw new ImageGenerationError('API Key 鉴权失败', 'invalid_params', false);
throw new ImageGenerationError(resp.status_msg ?? `MiniMax error ${code}`, 'unknown', false);
}
const urls = json.data?.image_urls ?? [];
const b64s = json.data?.image_base64 ?? [];
const images = format === 'base64'
? b64s.map(b => ({ base64: b }))
: urls.map(u => ({ url: u }));
return { images, model: creds.modelId, provider: 'minimax' };
}
class MiniMaxImageProvider implements ImageGenerationProvider {
readonly id = 'minimax';
async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const body = buildBaseBody(params, creds);
return this.request(body, creds, params.responseFormat ?? 'url');
}
async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const body = buildBaseBody(params, creds);
body.subject_reference = [{ image: params.referenceImage }];
if (params.strength !== undefined) body.strength = params.strength;
return this.request(body, creds, params.responseFormat ?? 'url');
}
private async request(body: Record<string, unknown>, creds: ImageProviderCredentials, format: string): Promise<ImageGenerationResult> {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 60_000);
try {
const baseUrl = creds.baseUrl.replace(/\/+$/, '');
const res = await fetch(`${baseUrl}/v1/image_generation`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${creds.apiKey}`,
},
body: JSON.stringify(body),
signal: controller.signal,
});
if (!res.ok) {
const text = await res.text().catch(() => '');
throw new ImageGenerationError(`MiniMax HTTP ${res.status}: ${text}`, res.status === 429 ? 'rate_limit' : 'unknown', res.status === 429);
}
const json: MiniMaxResponse = await res.json();
return normalizeResponse(json, creds, format);
} catch (err: any) {
if (err instanceof ImageGenerationError) throw err;
if (err?.name === 'AbortError') throw new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true);
throw new ImageGenerationError(err?.message ?? String(err), 'network', true);
} finally {
clearTimeout(timer);
}
}
}
registerImageProvider(new MiniMaxImageProvider());
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/image_providers/minimax.ts
git commit -m "feat(netaclaw): add MiniMax image provider via REST API"
```
---
### Task 6: 图片转存服务
**Files:**
- Create: `packages/backend/src/modules/netaclaw/service/image_storage.ts`
- [ ] **Step 1: 实现 ImageStorageService**
复用现有 `pluginService.getInstance('upload')``downAndUpload` 方法:
```typescript
import { Inject, Provide, Scope, ScopeEnum } from '@midwayjs/core';
import { PluginService } from '../../plugin/service/info.js';
import { randomUUID } from 'crypto';
@Provide()
@Scope(ScopeEnum.Singleton)
export class ImageStorageService {
@Inject()
pluginService: PluginService;
async persist(tempUrl: string): Promise<string> {
const upload = await this.pluginService.getInstance('upload');
const ext = this.detectExtension(tempUrl);
const filename = `img-${Date.now()}-${randomUUID().slice(0, 8)}${ext}`;
return upload.downAndUpload(tempUrl, filename);
}
async persistAll(urls: string[]): Promise<string[]> {
return Promise.all(urls.map(url => this.persist(url)));
}
private detectExtension(url: string): string {
const pathname = url.split('?')[0];
const match = pathname.match(/\.(png|jpg|jpeg|webp|gif)$/i);
return match ? `.${match[1].toLowerCase()}` : '.png';
}
}
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/service/image_storage.ts
git commit -m "feat(netaclaw): add ImageStorageService for persisting generated images"
```
---
### Task 7: 图片工具共享函数
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts`
- [ ] **Step 1: 创建 image_common.ts**
提取 text_to_image 和 image_to_image 共用的辅助函数:
```typescript
import { imageResult, imagesResult } from '../common.js';
import type { ImageGenerationResult } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
export function clampDimension(value: number | undefined, max: number): number | undefined {
if (value === undefined) return undefined;
return Math.min(value, max);
}
export async function persistImages(result: ImageGenerationResult, storage: ImageStorageService): Promise<ImageGenerationResult> {
const persisted = await Promise.all(
result.images.map(async img => {
if (!img.url) return img;
const permanentUrl = await storage.persist(img.url);
return { ...img, url: permanentUrl };
})
);
return { ...result, images: persisted };
}
export function formatImageToolResult(result: ImageGenerationResult) {
if (result.images.length === 1) {
const img = result.images[0];
return imageResult(img.url!, undefined, {
width: img.width,
height: img.height,
text: `图片已生成 (${result.provider}/${result.model})`,
});
}
return imagesResult(
result.images.map(img => ({ url: img.url!, width: img.width, height: img.height })),
`已生成 ${result.images.length} 张图片 (${result.provider}/${result.model})`,
);
}
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts
git commit -m "feat(netaclaw): extract shared image tool helpers"
```
---
### Task 8: text_to_image 工具
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts`
- Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts`
- [ ] **Step 1: 创建 text_to_image.ts**
```typescript
import { Type, Static } from '@sinclair/typebox';
import { type AnyAgentTool } from '../common.js';
import { registerSchema } from '../catalog.js';
import type {
ImageGenerationProvider,
ImageProviderCredentials,
} from '../../image_providers/types.js';
import { ImageGenerationError } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
import { clampDimension, persistImages, formatImageToolResult } from './image_common.js';
const Params = Type.Object({
prompt: Type.String({ description: '图片描述,尽量详细具体' }),
aspectRatio: Type.Optional(Type.String({
description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16',
})),
width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })),
height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })),
n: Type.Optional(Type.Integer({ description: '生成数量,默认 1最大 9', minimum: 1, maximum: 9 })),
watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })),
seed: Type.Optional(Type.Integer({ description: '随机种子,相同 seed 可复现相近结果' })),
extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), {
description: 'Provider 特有参数,如 MiniMax 的 style、prompt_optimizer',
})),
});
export function createTextToImageTool(
creds: ImageProviderCredentials,
provider: ImageGenerationProvider,
storage: ImageStorageService,
): AnyAgentTool {
const defaults = creds.extra?.imageDefaults ?? {};
const constraints = creds.extra?.imageConstraints ?? {};
return {
name: 'text_to_image',
label: '文生图',
description: creds.promptHint
? `根据文字描述生成图片。\n${creds.promptHint}`
: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。',
parameters: Params,
async execute(_id: string, params: Static<typeof Params>) {
const merged = {
prompt: params.prompt,
n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9),
aspectRatio: params.aspectRatio ?? defaults.aspectRatio,
width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048),
height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048),
watermark: params.watermark ?? defaults.watermark ?? false,
seed: params.seed,
responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64',
extra: params.extra,
};
try {
let result = await provider.textToImage(merged, creds);
result = await persistImages(result, storage);
return formatImageToolResult(result);
} catch (err) {
if (err instanceof ImageGenerationError) {
const prefix = err.retryable ? '[可重试] ' : '';
return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` };
}
return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` };
}
},
};
}
registerSchema({
name: 'text_to_image',
toolset: 'vision',
description: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。',
capability: 'multimodal',
visibility: 'tool',
isCore: false,
canDisable: true,
supportsPromptHint: true,
requiresModel: true,
});
```
- [ ] **Step 2: 在 catalog.ts 注册**
`catalog.ts` 末尾的 import 列表中(约第 66 行 `import './builtin/execute_skill.js';` 之后),新增:
```typescript
import './builtin/text_to_image.js';
```
- [ ] **Step 3: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 4: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts
git commit -m "feat(netaclaw): add text_to_image tool with defaults/constraints merge and image persistence"
```
---
### Task 9: image_to_image 工具
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts`
- Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts`
- [ ] **Step 1: 创建 image_to_image.ts**
```typescript
import { Type, Static } from '@sinclair/typebox';
import { type AnyAgentTool } from '../common.js';
import { registerSchema } from '../catalog.js';
import type {
ImageGenerationProvider,
ImageProviderCredentials,
} from '../../image_providers/types.js';
import { ImageGenerationError } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
import { clampDimension, persistImages, formatImageToolResult } from './image_common.js';
const Params = Type.Object({
prompt: Type.String({ description: '对参考图的修改描述' }),
referenceImage: Type.String({ description: '参考图片 URL从用户上传附件获取' }),
strength: Type.Optional(Type.Number({
description: '参考图影响强度 0-1越大越接近原图', minimum: 0, maximum: 1,
})),
aspectRatio: Type.Optional(Type.String({
description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16',
})),
width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })),
height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })),
n: Type.Optional(Type.Integer({ description: '生成数量,默认 1最大 9', minimum: 1, maximum: 9 })),
watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })),
seed: Type.Optional(Type.Integer({ description: '随机种子' })),
extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), {
description: 'Provider 特有参数',
})),
});
export function createImageToImageTool(
creds: ImageProviderCredentials,
provider: ImageGenerationProvider,
storage: ImageStorageService,
): AnyAgentTool {
const defaults = creds.extra?.imageDefaults ?? {};
const constraints = creds.extra?.imageConstraints ?? {};
return {
name: 'image_to_image',
label: '图生图',
description: creds.promptHint
? `基于参考图片生成新图片。\n${creds.promptHint}`
: '基于参考图片生成新图片支持风格迁移、内容编辑等。传入参考图URL和修改描述。',
parameters: Params,
async execute(_id: string, params: Static<typeof Params>) {
const merged = {
prompt: params.prompt,
referenceImage: params.referenceImage,
strength: params.strength,
n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9),
aspectRatio: params.aspectRatio ?? defaults.aspectRatio,
width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048),
height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048),
watermark: params.watermark ?? defaults.watermark ?? false,
seed: params.seed,
responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64',
extra: params.extra,
};
try {
let result = await provider.imageToImage(merged, creds);
result = await persistImages(result, storage);
return formatImageToolResult(result);
} catch (err) {
if (err instanceof ImageGenerationError) {
const prefix = err.retryable ? '[可重试] ' : '';
return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` };
}
return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` };
}
},
};
}
registerSchema({
name: 'image_to_image',
toolset: 'vision',
description: '基于参考图片生成新图片支持风格迁移、内容编辑等。传入参考图URL和修改描述。',
capability: 'multimodal',
visibility: 'tool',
isCore: false,
canDisable: true,
supportsPromptHint: true,
requiresModel: true,
});
```
- [ ] **Step 2: 在 catalog.ts 注册**
`catalog.ts` 末尾 `import './builtin/text_to_image.js';` 之后新增:
```typescript
import './builtin/image_to_image.js';
```
- [ ] **Step 3: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 4: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts
git commit -m "feat(netaclaw): add image_to_image tool with reference image support"
```
---
### Task 10: tool_resolver 集成
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/service/tool_resolver.ts`
- Modify: `packages/backend/src/modules/netaclaw/service/tool_registry.ts`
- [ ] **Step 1: 在 tool_registry.ts 新增 getToolByName 方法**
`NetaClawToolRegistryService` 类中(`getToolModelConfig` 方法附近),新增:
```typescript
async getToolByName(name: string): Promise<NetaClawToolEntity | null> {
return this.toolRepo.findOneBy({ name });
}
```
- [ ] **Step 2: 在 tool_resolver.ts 添加 import**
`tool_resolver.ts` 顶部 import 区域(约第 32 行 `import { createImageRecognizeTool }` 附近),新增:
```typescript
import { createTextToImageTool } from '../tools/builtin/text_to_image.js';
import { createImageToImageTool } from '../tools/builtin/image_to_image.js';
import { getImageProvider, type ImageProviderCredentials } from '../image_providers/types.js';
import '../image_providers/ark.js';
import '../image_providers/minimax.js';
import { ImageStorageService } from './image_storage.js';
```
- [ ] **Step 3: 注入 ImageStorageService**
`NetaClawToolResolverService` 类中,已有的 `@Inject()` 区域新增:
```typescript
@Inject()
imageStorageService: ImageStorageService;
```
- [ ] **Step 4: 在 resolve 方法中注入两个工具**
`tool_resolver.ts``resolve` 方法中,找到 `image_recognize` 的注入块(约第 647-664 行),在其 `}` 之后新增:
```typescript
for (const imgToolName of ['text_to_image', 'image_to_image'] as const) {
if (filteredNames.includes(imgToolName)) {
const toolModelConfig = await this.toolRegistry.getToolModelConfig(imgToolName);
if (toolModelConfig) {
const channelCreds = await this.modelChannelService.resolveForAgent(toolModelConfig.modelChannelId, toolModelConfig.modelId);
if (channelCreds) {
const provider = getImageProvider(channelCreds.channelSupplier, channelCreds.baseUrl ?? '');
if (provider) {
const toolEntity = await this.toolRegistry.getToolByName(imgToolName);
const extra = toolEntity?.extra as import('../tools/manifest.js').ToolGovernanceExtra | null;
const creds: ImageProviderCredentials = {
baseUrl: channelCreds.baseUrl ?? '',
apiKey: channelCreds.apiKey,
supplier: channelCreds.channelSupplier,
modelId: toolModelConfig.modelId,
promptHint: toolModelConfig.promptHint,
extra,
};
if (imgToolName === 'text_to_image') {
runtimeTools.push(createTextToImageTool(creds, provider, this.imageStorageService));
} else {
runtimeTools.push(createImageToImageTool(creds, provider, this.imageStorageService));
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'image_provider_not_found' });
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'model_channel_unavailable' });
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'model_not_configured' });
}
}
}
```
- [ ] **Step 5: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 6: 提交**
```bash
git add packages/backend/src/modules/netaclaw/service/tool_resolver.ts packages/backend/src/modules/netaclaw/service/tool_registry.ts
git commit -m "feat(netaclaw): integrate text_to_image and image_to_image into tool_resolver"
```
---
### Task 11: Prompt Builder 附件提示语扩展
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts`
- Modify: `packages/backend/src/modules/netaclaw/runtime/agent.ts`
- [ ] **Step 1: 修改 buildLLMMessages 签名**
`prompt_builder.ts` 第 141 行,给 `buildLLMMessages` 新增 `toolNames` 参数:
```typescript
export function buildLLMMessages(
systemPrompt: string,
history: LLMMessage[],
userMessage: UserMessageInput,
toolNames?: string[],
): LLMMessage[] {
```
- [ ] **Step 2: 修改附件提示语生成逻辑**
替换第 152-162 行的附件处理块:
```typescript
if (userMessage.metadata?.attachments && (userMessage.metadata.attachments as unknown[]).length) {
const attachments = userMessage.metadata.attachments as ChatAttachment[];
const desc = attachments.map(a => {
const typeLabel = ({ image: '图片', video: '视频', pdf: 'PDF', document: '文件', other: '文件' } as Record<string, string>)[a.type];
return `- ${typeLabel}: ${a.name} (URL: ${a.url})`;
}).join('\n');
const hints: string[] = [];
const names = new Set(toolNames ?? []);
if (names.has('image_recognize')) {
hints.push('如需分析图片内容,请使用 image_recognize 工具,传入图片 URL');
}
if (names.has('image_to_image')) {
hints.push('如需基于图片生成新图片,请使用 image_to_image 工具,将图片 URL 作为 referenceImage 参数');
}
if (hints.length === 0) {
hints.push('附件已上传,可在需要时引用其 URL');
}
messages.push({
role: 'user',
content: `[系统提示] 用户上传了以下附件:\n${desc}\n${hints.join('。')}。`,
});
}
```
- [ ] **Step 3: 修改 agent.ts 调用点**
`agent.ts` 第 96-100 行,给 `buildLLMMessages` 传入 `toolNames`
```typescript
const messages: LLMMessage[] = buildLLMMessages(
agentConfig.systemPrompt,
history,
{ content: userMessage, metadata: params.userMessageMetadata },
params.toolNames || tools.map(tool => tool.name),
);
```
- [ ] **Step 4: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 5: 提交**
```bash
git add packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts packages/backend/src/modules/netaclaw/runtime/agent.ts
git commit -m "feat(netaclaw): dynamic attachment hints based on available tools"
```
---
### Task 12: 前端 — renderer-registry 和 message-item 图片渲染
**Files:**
- Modify: `packages/frontend/src/modules/agent/tools/renderer-registry.ts`
- Modify: `packages/frontend/src/modules/agent/components/message-item.vue`
- [ ] **Step 1: 扩展 renderer-registry 的 rawResult 类型**
`renderer-registry.ts` 第 9 行 `ToolRenderSource` 接口的 `rawResult` 字段中,扩展 `type` 联合类型和新增 `images` 字段:
```typescript
rawResult?: {
type: 'text' | 'json' | 'image' | 'images';
text?: string;
data?: unknown;
url?: string;
mimeType?: string;
width?: number;
height?: number;
bytes?: number;
originalWidth?: number;
originalHeight?: number;
originalBytes?: number;
resized?: boolean;
images?: {
url: string;
mimeType?: string;
width?: number;
height?: number;
seed?: number;
}[];
};
```
- [ ] **Step 2: 修改 message-item.vue 模板 — 升级单图渲染并新增多图**
找到 `message-item.vue` 第 75-83 行现有的单图渲染块:
```vue
<div
v-if="tool.rawResult?.type === 'image' && tool.rawResult.url"
class="tool-execution__image"
>
<img :src="tool.rawResult.url" @click="openToolImage(tool.rawResult.url)" />
<div class="tool-execution__image-caption">
{{ formatToolImageCaption(tool.rawResult) }}
</div>
</div>
```
替换为:
```vue
<div
v-if="tool.rawResult?.type === 'image' && tool.rawResult.url"
class="tool-execution__image"
>
<el-image
:src="tool.rawResult.url"
fit="contain"
:preview-src-list="[tool.rawResult.url]"
preview-teleported
class="tool-execution__image-single"
/>
<div class="tool-execution__image-caption">
{{ formatToolImageCaption(tool.rawResult) }}
</div>
</div>
<div
v-else-if="tool.rawResult?.type === 'images' && tool.rawResult.images?.length"
class="tool-execution__image"
>
<div v-if="tool.rawResult.text" class="tool-execution__image-caption">
{{ tool.rawResult.text }}
</div>
<div class="tool-execution__image-grid">
<el-image
v-for="(img, idx) in tool.rawResult.images"
:key="idx"
:src="img.url"
fit="cover"
:preview-src-list="tool.rawResult.images.map(i => i.url)"
:initial-index="idx"
preview-teleported
class="tool-execution__image-grid-item"
/>
</div>
</div>
```
- [ ] **Step 3: 新增多图网格样式**
`message-item.vue``<style>` 块中,找到现有的 `.tool-execution__image` 样式附近,新增:
```css
.tool-execution__image-single {
max-width: 360px;
border-radius: 8px;
}
.tool-execution__image-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(140px, 1fr));
gap: 8px;
margin-top: 8px;
}
.tool-execution__image-grid-item {
width: 100%;
aspect-ratio: 1;
border-radius: 8px;
cursor: pointer;
object-fit: cover;
}
```
- [ ] **Step 4: 验证前端编译**
Run: `cd packages/frontend && npx vue-tsc --noEmit 2>&1 | head -20`
Expected: 无新增错误
- [ ] **Step 5: 提交**
```bash
git add packages/frontend/src/modules/agent/tools/renderer-registry.ts packages/frontend/src/modules/agent/components/message-item.vue
git commit -m "feat(frontend): upgrade image rendering and add multi-image grid in message-item"
```
---
### Task 13: 前端 — 工具编辑页图片生成配置区块
**Files:**
- Modify: `packages/frontend/src/modules/agent/views/tools.vue`
- [ ] **Step 1: 在编辑抽屉中新增图片生成配置区块**
`tools.vue` 的编辑抽屉中,找到模型配置区域(约第 475-500 行 `<template v-if="editor.requiresModel === 1">` 块的 `</template>` 之后),新增:
```vue
<template v-if="isImageTool">
<el-divider>{{ t('图片生成配置') }}</el-divider>
<el-alert type="info" :closable="false" style="margin-bottom: 16px">
默认值在 Agent 未指定时生效Agent 可根据用户指令覆盖。硬上限不可突破。
</el-alert>
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="默认数量">
<el-input-number v-model="imageDefaults.n" :min="1" :max="9" :step="1" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="默认比例">
<el-select v-model="imageDefaults.aspectRatio" clearable placeholder="不限" style="width: 100%">
<el-option label="1:1" value="1:1" />
<el-option label="16:9" value="16:9" />
<el-option label="4:3" value="4:3" />
<el-option label="3:2" value="3:2" />
<el-option label="2:3" value="2:3" />
<el-option label="3:4" value="3:4" />
<el-option label="9:16" value="9:16" />
</el-select>
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="默认水印">
<el-switch v-model="imageDefaults.watermark" />
</el-form-item>
</el-col>
</el-row>
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="最大数量">
<el-input-number v-model="imageConstraints.maxN" :min="1" :max="9" :step="1" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="最大宽度">
<el-input-number v-model="imageConstraints.maxWidth" :min="512" :max="4096" :step="64" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="最大高度">
<el-input-number v-model="imageConstraints.maxHeight" :min="512" :max="4096" :step="64" style="width: 100%" />
</el-form-item>
</el-col>
</el-row>
</template>
```
- [ ] **Step 2: 新增响应式数据和计算属性**
`<script>` 区域的 `editor` reactive 对象附近新增:
```typescript
const isImageTool = computed(() =>
['text_to_image', 'image_to_image'].includes(editor.name)
);
const imageDefaults = reactive({
n: 1,
aspectRatio: '' as string,
watermark: false,
});
const imageConstraints = reactive({
maxN: 9,
maxWidth: 2048,
maxHeight: 2048,
});
```
- [ ] **Step 3: 在打开编辑抽屉时加载 extra 中的图片配置**
在打开编辑抽屉的逻辑中(`watch(editorVisible, ...)``openEditor` 函数),新增:
```typescript
const extra = row.extra as Record<string, any> ?? {};
if (extra.imageDefaults) {
Object.assign(imageDefaults, { n: 1, aspectRatio: '', watermark: false, ...extra.imageDefaults });
} else {
Object.assign(imageDefaults, { n: 1, aspectRatio: '', watermark: false });
}
if (extra.imageConstraints) {
Object.assign(imageConstraints, { maxN: 9, maxWidth: 2048, maxHeight: 2048, ...extra.imageConstraints });
} else {
Object.assign(imageConstraints, { maxN: 9, maxWidth: 2048, maxHeight: 2048 });
}
```
- [ ] **Step 4: 在保存时将图片配置写入 extra**
在保存编辑的逻辑中(`handleSave``handleUpdate` 函数),构造 extra 时新增:
```typescript
const extra: Record<string, unknown> = {
...(editor.governancePolicy?.allowInSubagent !== undefined ? { allowInSubagent: editor.governancePolicy.allowInSubagent } : {}),
...(editor.governancePolicy?.workerRoutingStrategy ? { workerRoutingStrategy: editor.governancePolicy.workerRoutingStrategy } : {}),
};
if (isImageTool.value) {
extra.imageDefaults = { ...imageDefaults };
extra.imageConstraints = { ...imageConstraints };
}
```
- [ ] **Step 5: 验证前端编译**
Run: `cd packages/frontend && npx vue-tsc --noEmit 2>&1 | head -20`
Expected: 无新增错误
- [ ] **Step 6: 提交**
```bash
git add packages/frontend/src/modules/agent/views/tools.vue
git commit -m "feat(frontend): add image generation config section in tool editor"
```
---
### Task 14: 端到端验证
- [ ] **Step 1: 启动后端**
Run: `cd packages/backend && npm run dev`
Expected: 启动成功,无报错
- [ ] **Step 2: 验证工具同步到数据库**
打开前端工具管理页,点击"同步工具目录",确认 `text_to_image``image_to_image` 出现在列表中toolset 为 `vision`capability 为 `multimodal`requiresModel 为 1。
- [ ] **Step 3: 配置模型渠道**
在工具编辑页为 `text_to_image` 配置模型渠道(选择已有的火山引擎或 MiniMax 渠道),选择对应的图片生成模型。
- [ ] **Step 4: 配置图片生成参数**
在工具编辑页的"图片生成配置"区块,设置默认值和硬上限,保存。
- [ ] **Step 5: 创建测试 Agent**
在 Agent 编辑页创建一个测试 Agent工具集启用 `text_to_image``image_to_image``image_recognize`
- [ ] **Step 6: 测试文生图**
在对话页向测试 Agent 发送"生成一张白底电商主图,蓝牙耳机",确认:
- Agent 调用 `text_to_image` 工具
- tool-card 渲染出生成的图片
- 图片 URL 是本地持久化 URL非临时 URL
- [ ] **Step 7: 测试图生图**
上传一张产品图片作为附件,发送"基于这张图生成一张白底主图",确认:
- 附件提示语包含 `image_to_image` 工具提示
- Agent 调用 `image_to_image` 工具referenceImage 为上传图片的 URL
- tool-card 渲染出生成的图片
- [ ] **Step 8: 测试多图**
发送"生成 3 张不同角度的产品图",确认:
- Agent 调用 `text_to_image` 时 n=3
- tool-card 以网格布局渲染 3 张图片
- 点击图片可预览大图
- [ ] **Step 9: 最终提交**
确认所有功能正常后,如有遗漏修复一并提交。