GPU_GUARD_MONOREPO/docs/superpowers/plans/2026-05-02-image-generation-tools.md

1347 lines
46 KiB
Markdown
Raw Normal View History

2026-05-20 21:39:12 +08:00
# 文生图与图生图工具 实施计划
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** 为 Neta Agent 新增 `text_to_image``image_to_image` 两个工具,建立可扩展的图片生成 Provider 层,前端支持图片结果渲染。
**Architecture:** 后端新增 `image_providers/` 目录实现 Provider 策略模式(火山引擎 + MiniMax两个工具通过 `tool_resolver` 注入凭证和 provider 实例。工具返回的图片 URL 经 `ImageStorageService` 转存本地后写入 session tree。前端 `message-item.vue` 升级图片渲染并新增多图网格分支。
**Tech Stack:** TypeScript, OpenAI SDK (火山引擎), fetch (MiniMax REST), TypeBox (参数 schema), Vue 3 + Element Plus (前端)
**Spec:** `docs/superpowers/specs/2026-05-02-image-generation-tools-design.md`
---
## File Map
| 文件 | 职责 | 操作 |
|------|------|------|
| `packages/backend/src/modules/netaclaw/image_providers/types.ts` | Provider 统一接口、凭证、错误类型、工厂函数 | 新增 |
| `packages/backend/src/modules/netaclaw/image_providers/ark.ts` | 火山引擎 Provider | 新增 |
| `packages/backend/src/modules/netaclaw/image_providers/minimax.ts` | MiniMax Provider | 新增 |
| `packages/backend/src/modules/netaclaw/tools/common.ts` | 新增 `ImageItem``images` 类型、`imagesResult()``toolResultToText` 扩展 | 修改 |
| `packages/backend/src/modules/netaclaw/tools/manifest.ts` | `ToolGovernanceExtra` 新增 `imageDefaults` / `imageConstraints` | 修改 |
| `packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts` | 图片工具共享辅助函数clampDimension、persistImages、formatImageToolResult | 新增 |
| `packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts` | 文生图工具 | 新增 |
| `packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts` | 图生图工具 | 新增 |
| `packages/backend/src/modules/netaclaw/tools/catalog.ts` | import 两个新工具触发注册 | 修改 |
| `packages/backend/src/modules/netaclaw/service/image_storage.ts` | 图片转存本地服务 | 新增 |
| `packages/backend/src/modules/netaclaw/service/tool_registry.ts` | 新增 `getToolByName` 方法 | 修改 |
| `packages/backend/src/modules/netaclaw/service/tool_resolver.ts` | 注入凭证和 provider 实例 | 修改 |
| `packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts` | 附件提示语扩展 | 修改 |
| `packages/backend/src/modules/netaclaw/runtime/agent.ts` | 调用 `buildLLMMessages``toolNames` | 修改 |
| `packages/frontend/src/modules/agent/tools/renderer-registry.ts` | rawResult 新增 `images` 类型 | 修改 |
| `packages/frontend/src/modules/agent/components/message-item.vue` | 单图升级为 el-image + 多图网格 | 修改 |
| `packages/frontend/src/modules/agent/views/tools.vue` | 图片生成配置区块 | 修改 |
---
### Task 1: 后端类型基础 — ToolResultContent 多图扩展
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/tools/common.ts`
- [ ] **Step 1: 在 ToolResultContent 类型前新增 ImageItem 接口**
`common.ts``ToolResultContent` 类型定义之前(约第 26 行),新增:
```typescript
export interface ImageItem {
url: string;
mimeType?: string;
width?: number;
height?: number;
seed?: number;
}
```
- [ ] **Step 2: 扩展 ToolResultContent 联合类型**
在现有 `type: 'image'` 分支之后,新增 `images` 分支:
```typescript
| { type: 'images'; images: ImageItem[]; text?: string };
```
完整类型变为:
```typescript
export type ToolResultContent =
| { type: 'text'; text: string }
| { type: 'json'; data: unknown }
| {
type: 'image';
url: string;
mimeType?: string;
text?: string;
width?: number;
height?: number;
bytes?: number;
originalWidth?: number;
originalHeight?: number;
originalBytes?: number;
resized?: boolean;
}
| { type: 'images'; images: ImageItem[]; text?: string };
```
- [ ] **Step 3: 新增 imagesResult 辅助函数**
`imageResult()` 函数之后新增:
```typescript
export function imagesResult(
images: ImageItem[],
text?: string,
): ToolResultContent {
return { type: 'images', images, text };
}
```
- [ ] **Step 4: 扩展 toolResultToText 函数**
`toolResultToText` 函数中,`if (value.type === 'image')` 分支之后,新增:
```typescript
if (value.type === 'images') {
const lines = (value as { type: 'images'; images: ImageItem[]; text?: string }).images.map((img, i) =>
`[图${i + 1}] ${img.url}${img.width && img.height ? ` (${img.width}x${img.height})` : ''}`
);
const header = (value as any).text || `已生成 ${lines.length} 张图片`;
return `${header}\n${lines.join('\n')}`;
}
```
- [ ] **Step 5: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无新增错误
- [ ] **Step 6: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/common.ts
git commit -m "feat(netaclaw): extend ToolResultContent with images type for multi-image tool results"
```
---
### Task 2: ToolGovernanceExtra 扩展 — imageDefaults / imageConstraints
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/tools/manifest.ts`
- [ ] **Step 1: 扩展 ToolGovernanceExtra 类型**
`manifest.ts` 第 9-12 行的 `ToolGovernanceExtra` 类型中,新增图片工具字段:
```typescript
export type ToolGovernanceExtra = {
allowInSubagent?: boolean;
workerRoutingStrategy?: ToolWorkerRoutingStrategy;
imageDefaults?: {
n?: number;
aspectRatio?: string;
width?: number;
height?: number;
watermark?: boolean;
responseFormat?: 'url' | 'base64';
};
imageConstraints?: {
maxN?: number;
maxWidth?: number;
maxHeight?: number;
};
};
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无新增错误(现有代码只读取 `allowInSubagent``workerRoutingStrategy`,新增字段不影响)
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/manifest.ts
git commit -m "feat(netaclaw): add imageDefaults and imageConstraints to ToolGovernanceExtra"
```
---
### Task 3: Provider 层 — 统一接口与工厂
**Files:**
- Create: `packages/backend/src/modules/netaclaw/image_providers/types.ts`
- [ ] **Step 1: 创建 image_providers 目录和 types.ts**
```typescript
import type { ToolGovernanceExtra } from '../tools/manifest.js';
export interface ImageProviderCredentials {
baseUrl: string;
apiKey: string;
supplier: string;
modelId: string;
promptHint: string | null;
extra?: ToolGovernanceExtra | null;
}
export interface TextToImageParams {
prompt: string;
width?: number;
height?: number;
aspectRatio?: string;
n?: number;
responseFormat?: 'url' | 'base64';
watermark?: boolean;
seed?: number;
extra?: Record<string, unknown>;
}
export interface ImageToImageParams extends TextToImageParams {
referenceImage: string;
strength?: number;
}
export interface ImageGenerationResult {
images: { url?: string; base64?: string; width?: number; height?: number }[];
model: string;
provider: string;
}
export interface ImageGenerationProvider {
readonly id: string;
textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult>;
imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult>;
}
export type ImageGenerationErrorCode =
| 'content_safety'
| 'rate_limit'
| 'insufficient_balance'
| 'invalid_params'
| 'timeout'
| 'network'
| 'unknown';
export class ImageGenerationError extends Error {
constructor(
message: string,
public readonly code: ImageGenerationErrorCode,
public readonly retryable: boolean,
) {
super(message);
this.name = 'ImageGenerationError';
}
}
const providers = new Map<string, ImageGenerationProvider>();
export function registerImageProvider(provider: ImageGenerationProvider): void {
providers.set(provider.id, provider);
}
export function getImageProvider(supplier: string, baseUrl: string): ImageGenerationProvider | null {
const s = supplier.toLowerCase();
if (s === 'minimax') return providers.get('minimax') ?? null;
if (s === 'ark' || s === 'volcengine') return providers.get('ark') ?? null;
if (s === 'openai') {
if (baseUrl.includes('volces.com') || baseUrl.includes('volcengine')) return providers.get('ark') ?? null;
if (baseUrl.includes('minimax')) return providers.get('minimax') ?? null;
}
return null;
}
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/image_providers/
git commit -m "feat(netaclaw): add image provider types, error class, and factory"
```
---
### Task 4: Provider 层 — 火山引擎 (Ark)
**Files:**
- Create: `packages/backend/src/modules/netaclaw/image_providers/ark.ts`
- [ ] **Step 1: 实现 ArkImageProvider**
```typescript
import OpenAI from 'openai';
import {
type ImageGenerationProvider,
type ImageProviderCredentials,
type TextToImageParams,
type ImageToImageParams,
type ImageGenerationResult,
ImageGenerationError,
registerImageProvider,
} from './types.js';
function resolveSize(params: TextToImageParams): string | undefined {
if (params.aspectRatio) {
const map: Record<string, string> = {
'1:1': '1024x1024',
'16:9': '1280x720',
'4:3': '1152x864',
'3:2': '1248x832',
'2:3': '832x1248',
'3:4': '864x1152',
'9:16': '720x1280',
};
return map[params.aspectRatio] ?? '1024x1024';
}
if (params.width && params.height) {
return `${params.width}x${params.height}`;
}
return undefined;
}
function normalizeResult(response: OpenAI.Images.ImagesResponse, creds: ImageProviderCredentials): ImageGenerationResult {
return {
images: (response.data ?? []).map(item => ({
url: item.url,
base64: item.b64_json,
})),
model: creds.modelId,
provider: 'ark',
};
}
class ArkImageProvider implements ImageGenerationProvider {
readonly id = 'ark';
async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 });
try {
const response = await client.images.generate({
model: creds.modelId,
prompt: params.prompt,
size: resolveSize(params) as any,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
...(params.extra || params.watermark !== undefined
? {
extra_body: {
...(params.watermark !== undefined ? { watermark: params.watermark } : {}),
...params.extra,
},
}
: {}),
});
return normalizeResult(response, creds);
} catch (err: any) {
throw this.wrapError(err);
}
}
async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 });
try {
const response = await client.images.generate({
model: creds.modelId,
prompt: params.prompt,
size: resolveSize(params) as any,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
extra_body: {
image: params.referenceImage,
...(params.strength !== undefined ? { strength: params.strength } : {}),
...(params.watermark !== undefined ? { watermark: params.watermark } : {}),
...params.extra,
},
} as any);
return normalizeResult(response, creds);
} catch (err: any) {
throw this.wrapError(err);
}
}
private wrapError(err: any): ImageGenerationError {
const status = err?.status ?? err?.response?.status;
const msg = err?.message ?? String(err);
if (status === 429) return new ImageGenerationError(msg, 'rate_limit', true);
if (status === 402) return new ImageGenerationError(msg, 'insufficient_balance', false);
if (status === 400) {
if (msg.includes('safety') || msg.includes('sensitive') || msg.includes('安全'))
return new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false);
return new ImageGenerationError(msg, 'invalid_params', false);
}
if (err?.code === 'ETIMEDOUT' || err?.code === 'ECONNABORTED')
return new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true);
return new ImageGenerationError(msg, 'unknown', false);
}
}
registerImageProvider(new ArkImageProvider());
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/image_providers/ark.ts
git commit -m "feat(netaclaw): add Ark (Volcano Engine) image provider via OpenAI SDK"
```
---
### Task 5: Provider 层 — MiniMax
**Files:**
- Create: `packages/backend/src/modules/netaclaw/image_providers/minimax.ts`
- [ ] **Step 1: 实现 MiniMaxImageProvider**
```typescript
import {
type ImageGenerationProvider,
type ImageProviderCredentials,
type TextToImageParams,
type ImageToImageParams,
type ImageGenerationResult,
ImageGenerationError,
registerImageProvider,
} from './types.js';
interface MiniMaxResponse {
id?: string;
data?: { image_urls?: string[]; image_base64?: string[] };
metadata?: { success_count?: number; failed_count?: number };
base_resp?: { status_code?: number; status_msg?: string };
}
function buildBaseBody(params: TextToImageParams, creds: ImageProviderCredentials): Record<string, unknown> {
const body: Record<string, unknown> = {
model: creds.modelId,
prompt: params.prompt,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
};
if (params.watermark !== undefined) body.aigc_watermark = params.watermark;
if (params.aspectRatio) {
body.aspect_ratio = params.aspectRatio;
} else if (params.width && params.height) {
body.width = params.width;
body.height = params.height;
}
if (params.seed !== undefined) body.seed = params.seed;
if (params.extra?.style) body.style = params.extra.style;
if (params.extra?.prompt_optimizer !== undefined) body.prompt_optimizer = params.extra.prompt_optimizer;
return body;
}
function normalizeResponse(json: MiniMaxResponse, creds: ImageProviderCredentials, format: string): ImageGenerationResult {
const resp = json.base_resp;
if (resp && resp.status_code !== 0) {
const code = resp.status_code;
if (code === 1026) throw new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false);
if (code === 1002) throw new ImageGenerationError('当前请求过多,请稍后重试', 'rate_limit', true);
if (code === 1008) throw new ImageGenerationError('模型渠道余额不足', 'insufficient_balance', false);
if (code === 1004) throw new ImageGenerationError('API Key 鉴权失败', 'invalid_params', false);
throw new ImageGenerationError(resp.status_msg ?? `MiniMax error ${code}`, 'unknown', false);
}
const urls = json.data?.image_urls ?? [];
const b64s = json.data?.image_base64 ?? [];
const images = format === 'base64'
? b64s.map(b => ({ base64: b }))
: urls.map(u => ({ url: u }));
return { images, model: creds.modelId, provider: 'minimax' };
}
class MiniMaxImageProvider implements ImageGenerationProvider {
readonly id = 'minimax';
async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const body = buildBaseBody(params, creds);
return this.request(body, creds, params.responseFormat ?? 'url');
}
async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const body = buildBaseBody(params, creds);
body.subject_reference = [{ image: params.referenceImage }];
if (params.strength !== undefined) body.strength = params.strength;
return this.request(body, creds, params.responseFormat ?? 'url');
}
private async request(body: Record<string, unknown>, creds: ImageProviderCredentials, format: string): Promise<ImageGenerationResult> {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 60_000);
try {
const baseUrl = creds.baseUrl.replace(/\/+$/, '');
const res = await fetch(`${baseUrl}/v1/image_generation`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${creds.apiKey}`,
},
body: JSON.stringify(body),
signal: controller.signal,
});
if (!res.ok) {
const text = await res.text().catch(() => '');
throw new ImageGenerationError(`MiniMax HTTP ${res.status}: ${text}`, res.status === 429 ? 'rate_limit' : 'unknown', res.status === 429);
}
const json: MiniMaxResponse = await res.json();
return normalizeResponse(json, creds, format);
} catch (err: any) {
if (err instanceof ImageGenerationError) throw err;
if (err?.name === 'AbortError') throw new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true);
throw new ImageGenerationError(err?.message ?? String(err), 'network', true);
} finally {
clearTimeout(timer);
}
}
}
registerImageProvider(new MiniMaxImageProvider());
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/image_providers/minimax.ts
git commit -m "feat(netaclaw): add MiniMax image provider via REST API"
```
---
### Task 6: 图片转存服务
**Files:**
- Create: `packages/backend/src/modules/netaclaw/service/image_storage.ts`
- [ ] **Step 1: 实现 ImageStorageService**
复用现有 `pluginService.getInstance('upload')``downAndUpload` 方法:
```typescript
import { Inject, Provide, Scope, ScopeEnum } from '@midwayjs/core';
import { PluginService } from '../../plugin/service/info.js';
import { randomUUID } from 'crypto';
@Provide()
@Scope(ScopeEnum.Singleton)
export class ImageStorageService {
@Inject()
pluginService: PluginService;
async persist(tempUrl: string): Promise<string> {
const upload = await this.pluginService.getInstance('upload');
const ext = this.detectExtension(tempUrl);
const filename = `img-${Date.now()}-${randomUUID().slice(0, 8)}${ext}`;
return upload.downAndUpload(tempUrl, filename);
}
async persistAll(urls: string[]): Promise<string[]> {
return Promise.all(urls.map(url => this.persist(url)));
}
private detectExtension(url: string): string {
const pathname = url.split('?')[0];
const match = pathname.match(/\.(png|jpg|jpeg|webp|gif)$/i);
return match ? `.${match[1].toLowerCase()}` : '.png';
}
}
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/service/image_storage.ts
git commit -m "feat(netaclaw): add ImageStorageService for persisting generated images"
```
---
### Task 7: 图片工具共享函数
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts`
- [ ] **Step 1: 创建 image_common.ts**
提取 text_to_image 和 image_to_image 共用的辅助函数:
```typescript
import { imageResult, imagesResult } from '../common.js';
import type { ImageGenerationResult } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
export function clampDimension(value: number | undefined, max: number): number | undefined {
if (value === undefined) return undefined;
return Math.min(value, max);
}
export async function persistImages(result: ImageGenerationResult, storage: ImageStorageService): Promise<ImageGenerationResult> {
const persisted = await Promise.all(
result.images.map(async img => {
if (!img.url) return img;
const permanentUrl = await storage.persist(img.url);
return { ...img, url: permanentUrl };
})
);
return { ...result, images: persisted };
}
export function formatImageToolResult(result: ImageGenerationResult) {
if (result.images.length === 1) {
const img = result.images[0];
return imageResult(img.url!, undefined, {
width: img.width,
height: img.height,
text: `图片已生成 (${result.provider}/${result.model})`,
});
}
return imagesResult(
result.images.map(img => ({ url: img.url!, width: img.width, height: img.height })),
`已生成 ${result.images.length} 张图片 (${result.provider}/${result.model})`,
);
}
```
- [ ] **Step 2: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 3: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts
git commit -m "feat(netaclaw): extract shared image tool helpers"
```
---
### Task 8: text_to_image 工具
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts`
- Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts`
- [ ] **Step 1: 创建 text_to_image.ts**
```typescript
import { Type, Static } from '@sinclair/typebox';
import { type AnyAgentTool } from '../common.js';
import { registerSchema } from '../catalog.js';
import type {
ImageGenerationProvider,
ImageProviderCredentials,
} from '../../image_providers/types.js';
import { ImageGenerationError } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
import { clampDimension, persistImages, formatImageToolResult } from './image_common.js';
const Params = Type.Object({
prompt: Type.String({ description: '图片描述,尽量详细具体' }),
aspectRatio: Type.Optional(Type.String({
description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16',
})),
width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })),
height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })),
n: Type.Optional(Type.Integer({ description: '生成数量,默认 1最大 9', minimum: 1, maximum: 9 })),
watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })),
seed: Type.Optional(Type.Integer({ description: '随机种子,相同 seed 可复现相近结果' })),
extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), {
description: 'Provider 特有参数,如 MiniMax 的 style、prompt_optimizer',
})),
});
export function createTextToImageTool(
creds: ImageProviderCredentials,
provider: ImageGenerationProvider,
storage: ImageStorageService,
): AnyAgentTool {
const defaults = creds.extra?.imageDefaults ?? {};
const constraints = creds.extra?.imageConstraints ?? {};
return {
name: 'text_to_image',
label: '文生图',
description: creds.promptHint
? `根据文字描述生成图片。\n${creds.promptHint}`
: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。',
parameters: Params,
async execute(_id: string, params: Static<typeof Params>) {
const merged = {
prompt: params.prompt,
n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9),
aspectRatio: params.aspectRatio ?? defaults.aspectRatio,
width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048),
height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048),
watermark: params.watermark ?? defaults.watermark ?? false,
seed: params.seed,
responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64',
extra: params.extra,
};
try {
let result = await provider.textToImage(merged, creds);
result = await persistImages(result, storage);
return formatImageToolResult(result);
} catch (err) {
if (err instanceof ImageGenerationError) {
const prefix = err.retryable ? '[可重试] ' : '';
return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` };
}
return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` };
}
},
};
}
registerSchema({
name: 'text_to_image',
toolset: 'vision',
description: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。',
capability: 'multimodal',
visibility: 'tool',
isCore: false,
canDisable: true,
supportsPromptHint: true,
requiresModel: true,
});
```
- [ ] **Step 2: 在 catalog.ts 注册**
`catalog.ts` 末尾的 import 列表中(约第 66 行 `import './builtin/execute_skill.js';` 之后),新增:
```typescript
import './builtin/text_to_image.js';
```
- [ ] **Step 3: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 4: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts
git commit -m "feat(netaclaw): add text_to_image tool with defaults/constraints merge and image persistence"
```
---
### Task 9: image_to_image 工具
**Files:**
- Create: `packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts`
- Modify: `packages/backend/src/modules/netaclaw/tools/catalog.ts`
- [ ] **Step 1: 创建 image_to_image.ts**
```typescript
import { Type, Static } from '@sinclair/typebox';
import { type AnyAgentTool } from '../common.js';
import { registerSchema } from '../catalog.js';
import type {
ImageGenerationProvider,
ImageProviderCredentials,
} from '../../image_providers/types.js';
import { ImageGenerationError } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
import { clampDimension, persistImages, formatImageToolResult } from './image_common.js';
const Params = Type.Object({
prompt: Type.String({ description: '对参考图的修改描述' }),
referenceImage: Type.String({ description: '参考图片 URL从用户上传附件获取' }),
strength: Type.Optional(Type.Number({
description: '参考图影响强度 0-1越大越接近原图', minimum: 0, maximum: 1,
})),
aspectRatio: Type.Optional(Type.String({
description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16',
})),
width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })),
height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })),
n: Type.Optional(Type.Integer({ description: '生成数量,默认 1最大 9', minimum: 1, maximum: 9 })),
watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })),
seed: Type.Optional(Type.Integer({ description: '随机种子' })),
extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), {
description: 'Provider 特有参数',
})),
});
export function createImageToImageTool(
creds: ImageProviderCredentials,
provider: ImageGenerationProvider,
storage: ImageStorageService,
): AnyAgentTool {
const defaults = creds.extra?.imageDefaults ?? {};
const constraints = creds.extra?.imageConstraints ?? {};
return {
name: 'image_to_image',
label: '图生图',
description: creds.promptHint
? `基于参考图片生成新图片。\n${creds.promptHint}`
: '基于参考图片生成新图片支持风格迁移、内容编辑等。传入参考图URL和修改描述。',
parameters: Params,
async execute(_id: string, params: Static<typeof Params>) {
const merged = {
prompt: params.prompt,
referenceImage: params.referenceImage,
strength: params.strength,
n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9),
aspectRatio: params.aspectRatio ?? defaults.aspectRatio,
width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048),
height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048),
watermark: params.watermark ?? defaults.watermark ?? false,
seed: params.seed,
responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64',
extra: params.extra,
};
try {
let result = await provider.imageToImage(merged, creds);
result = await persistImages(result, storage);
return formatImageToolResult(result);
} catch (err) {
if (err instanceof ImageGenerationError) {
const prefix = err.retryable ? '[可重试] ' : '';
return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` };
}
return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` };
}
},
};
}
registerSchema({
name: 'image_to_image',
toolset: 'vision',
description: '基于参考图片生成新图片支持风格迁移、内容编辑等。传入参考图URL和修改描述。',
capability: 'multimodal',
visibility: 'tool',
isCore: false,
canDisable: true,
supportsPromptHint: true,
requiresModel: true,
});
```
- [ ] **Step 2: 在 catalog.ts 注册**
`catalog.ts` 末尾 `import './builtin/text_to_image.js';` 之后新增:
```typescript
import './builtin/image_to_image.js';
```
- [ ] **Step 3: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 4: 提交**
```bash
git add packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts
git commit -m "feat(netaclaw): add image_to_image tool with reference image support"
```
---
### Task 10: tool_resolver 集成
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/service/tool_resolver.ts`
- Modify: `packages/backend/src/modules/netaclaw/service/tool_registry.ts`
- [ ] **Step 1: 在 tool_registry.ts 新增 getToolByName 方法**
`NetaClawToolRegistryService` 类中(`getToolModelConfig` 方法附近),新增:
```typescript
async getToolByName(name: string): Promise<NetaClawToolEntity | null> {
return this.toolRepo.findOneBy({ name });
}
```
- [ ] **Step 2: 在 tool_resolver.ts 添加 import**
`tool_resolver.ts` 顶部 import 区域(约第 32 行 `import { createImageRecognizeTool }` 附近),新增:
```typescript
import { createTextToImageTool } from '../tools/builtin/text_to_image.js';
import { createImageToImageTool } from '../tools/builtin/image_to_image.js';
import { getImageProvider, type ImageProviderCredentials } from '../image_providers/types.js';
import '../image_providers/ark.js';
import '../image_providers/minimax.js';
import { ImageStorageService } from './image_storage.js';
```
- [ ] **Step 3: 注入 ImageStorageService**
`NetaClawToolResolverService` 类中,已有的 `@Inject()` 区域新增:
```typescript
@Inject()
imageStorageService: ImageStorageService;
```
- [ ] **Step 4: 在 resolve 方法中注入两个工具**
`tool_resolver.ts``resolve` 方法中,找到 `image_recognize` 的注入块(约第 647-664 行),在其 `}` 之后新增:
```typescript
for (const imgToolName of ['text_to_image', 'image_to_image'] as const) {
if (filteredNames.includes(imgToolName)) {
const toolModelConfig = await this.toolRegistry.getToolModelConfig(imgToolName);
if (toolModelConfig) {
const channelCreds = await this.modelChannelService.resolveForAgent(toolModelConfig.modelChannelId, toolModelConfig.modelId);
if (channelCreds) {
const provider = getImageProvider(channelCreds.channelSupplier, channelCreds.baseUrl ?? '');
if (provider) {
const toolEntity = await this.toolRegistry.getToolByName(imgToolName);
const extra = toolEntity?.extra as import('../tools/manifest.js').ToolGovernanceExtra | null;
const creds: ImageProviderCredentials = {
baseUrl: channelCreds.baseUrl ?? '',
apiKey: channelCreds.apiKey,
supplier: channelCreds.channelSupplier,
modelId: toolModelConfig.modelId,
promptHint: toolModelConfig.promptHint,
extra,
};
if (imgToolName === 'text_to_image') {
runtimeTools.push(createTextToImageTool(creds, provider, this.imageStorageService));
} else {
runtimeTools.push(createImageToImageTool(creds, provider, this.imageStorageService));
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'image_provider_not_found' });
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'model_channel_unavailable' });
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'model_not_configured' });
}
}
}
```
- [ ] **Step 5: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 6: 提交**
```bash
git add packages/backend/src/modules/netaclaw/service/tool_resolver.ts packages/backend/src/modules/netaclaw/service/tool_registry.ts
git commit -m "feat(netaclaw): integrate text_to_image and image_to_image into tool_resolver"
```
---
### Task 11: Prompt Builder 附件提示语扩展
**Files:**
- Modify: `packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts`
- Modify: `packages/backend/src/modules/netaclaw/runtime/agent.ts`
- [ ] **Step 1: 修改 buildLLMMessages 签名**
`prompt_builder.ts` 第 141 行,给 `buildLLMMessages` 新增 `toolNames` 参数:
```typescript
export function buildLLMMessages(
systemPrompt: string,
history: LLMMessage[],
userMessage: UserMessageInput,
toolNames?: string[],
): LLMMessage[] {
```
- [ ] **Step 2: 修改附件提示语生成逻辑**
替换第 152-162 行的附件处理块:
```typescript
if (userMessage.metadata?.attachments && (userMessage.metadata.attachments as unknown[]).length) {
const attachments = userMessage.metadata.attachments as ChatAttachment[];
const desc = attachments.map(a => {
const typeLabel = ({ image: '图片', video: '视频', pdf: 'PDF', document: '文件', other: '文件' } as Record<string, string>)[a.type];
return `- ${typeLabel}: ${a.name} (URL: ${a.url})`;
}).join('\n');
const hints: string[] = [];
const names = new Set(toolNames ?? []);
if (names.has('image_recognize')) {
hints.push('如需分析图片内容,请使用 image_recognize 工具,传入图片 URL');
}
if (names.has('image_to_image')) {
hints.push('如需基于图片生成新图片,请使用 image_to_image 工具,将图片 URL 作为 referenceImage 参数');
}
if (hints.length === 0) {
hints.push('附件已上传,可在需要时引用其 URL');
}
messages.push({
role: 'user',
content: `[系统提示] 用户上传了以下附件:\n${desc}\n${hints.join('。')}。`,
});
}
```
- [ ] **Step 3: 修改 agent.ts 调用点**
`agent.ts` 第 96-100 行,给 `buildLLMMessages` 传入 `toolNames`
```typescript
const messages: LLMMessage[] = buildLLMMessages(
agentConfig.systemPrompt,
history,
{ content: userMessage, metadata: params.userMessageMetadata },
params.toolNames || tools.map(tool => tool.name),
);
```
- [ ] **Step 4: 验证编译**
Run: `cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20`
Expected: 无错误
- [ ] **Step 5: 提交**
```bash
git add packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts packages/backend/src/modules/netaclaw/runtime/agent.ts
git commit -m "feat(netaclaw): dynamic attachment hints based on available tools"
```
---
### Task 12: 前端 — renderer-registry 和 message-item 图片渲染
**Files:**
- Modify: `packages/frontend/src/modules/agent/tools/renderer-registry.ts`
- Modify: `packages/frontend/src/modules/agent/components/message-item.vue`
- [ ] **Step 1: 扩展 renderer-registry 的 rawResult 类型**
`renderer-registry.ts` 第 9 行 `ToolRenderSource` 接口的 `rawResult` 字段中,扩展 `type` 联合类型和新增 `images` 字段:
```typescript
rawResult?: {
type: 'text' | 'json' | 'image' | 'images';
text?: string;
data?: unknown;
url?: string;
mimeType?: string;
width?: number;
height?: number;
bytes?: number;
originalWidth?: number;
originalHeight?: number;
originalBytes?: number;
resized?: boolean;
images?: {
url: string;
mimeType?: string;
width?: number;
height?: number;
seed?: number;
}[];
};
```
- [ ] **Step 2: 修改 message-item.vue 模板 — 升级单图渲染并新增多图**
找到 `message-item.vue` 第 75-83 行现有的单图渲染块:
```vue
<div
v-if="tool.rawResult?.type === 'image' && tool.rawResult.url"
class="tool-execution__image"
>
<img :src="tool.rawResult.url" @click="openToolImage(tool.rawResult.url)" />
<div class="tool-execution__image-caption">
{{ formatToolImageCaption(tool.rawResult) }}
</div>
</div>
```
替换为:
```vue
<div
v-if="tool.rawResult?.type === 'image' && tool.rawResult.url"
class="tool-execution__image"
>
<el-image
:src="tool.rawResult.url"
fit="contain"
:preview-src-list="[tool.rawResult.url]"
preview-teleported
class="tool-execution__image-single"
/>
<div class="tool-execution__image-caption">
{{ formatToolImageCaption(tool.rawResult) }}
</div>
</div>
<div
v-else-if="tool.rawResult?.type === 'images' && tool.rawResult.images?.length"
class="tool-execution__image"
>
<div v-if="tool.rawResult.text" class="tool-execution__image-caption">
{{ tool.rawResult.text }}
</div>
<div class="tool-execution__image-grid">
<el-image
v-for="(img, idx) in tool.rawResult.images"
:key="idx"
:src="img.url"
fit="cover"
:preview-src-list="tool.rawResult.images.map(i => i.url)"
:initial-index="idx"
preview-teleported
class="tool-execution__image-grid-item"
/>
</div>
</div>
```
- [ ] **Step 3: 新增多图网格样式**
`message-item.vue``<style>` 块中,找到现有的 `.tool-execution__image` 样式附近,新增:
```css
.tool-execution__image-single {
max-width: 360px;
border-radius: 8px;
}
.tool-execution__image-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(140px, 1fr));
gap: 8px;
margin-top: 8px;
}
.tool-execution__image-grid-item {
width: 100%;
aspect-ratio: 1;
border-radius: 8px;
cursor: pointer;
object-fit: cover;
}
```
- [ ] **Step 4: 验证前端编译**
Run: `cd packages/frontend && npx vue-tsc --noEmit 2>&1 | head -20`
Expected: 无新增错误
- [ ] **Step 5: 提交**
```bash
git add packages/frontend/src/modules/agent/tools/renderer-registry.ts packages/frontend/src/modules/agent/components/message-item.vue
git commit -m "feat(frontend): upgrade image rendering and add multi-image grid in message-item"
```
---
### Task 13: 前端 — 工具编辑页图片生成配置区块
**Files:**
- Modify: `packages/frontend/src/modules/agent/views/tools.vue`
- [ ] **Step 1: 在编辑抽屉中新增图片生成配置区块**
`tools.vue` 的编辑抽屉中,找到模型配置区域(约第 475-500 行 `<template v-if="editor.requiresModel === 1">` 块的 `</template>` 之后),新增:
```vue
<template v-if="isImageTool">
<el-divider>{{ t('图片生成配置') }}</el-divider>
<el-alert type="info" :closable="false" style="margin-bottom: 16px">
默认值在 Agent 未指定时生效Agent 可根据用户指令覆盖。硬上限不可突破。
</el-alert>
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="默认数量">
<el-input-number v-model="imageDefaults.n" :min="1" :max="9" :step="1" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="默认比例">
<el-select v-model="imageDefaults.aspectRatio" clearable placeholder="不限" style="width: 100%">
<el-option label="1:1" value="1:1" />
<el-option label="16:9" value="16:9" />
<el-option label="4:3" value="4:3" />
<el-option label="3:2" value="3:2" />
<el-option label="2:3" value="2:3" />
<el-option label="3:4" value="3:4" />
<el-option label="9:16" value="9:16" />
</el-select>
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="默认水印">
<el-switch v-model="imageDefaults.watermark" />
</el-form-item>
</el-col>
</el-row>
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="最大数量">
<el-input-number v-model="imageConstraints.maxN" :min="1" :max="9" :step="1" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="最大宽度">
<el-input-number v-model="imageConstraints.maxWidth" :min="512" :max="4096" :step="64" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="最大高度">
<el-input-number v-model="imageConstraints.maxHeight" :min="512" :max="4096" :step="64" style="width: 100%" />
</el-form-item>
</el-col>
</el-row>
</template>
```
- [ ] **Step 2: 新增响应式数据和计算属性**
`<script>` 区域的 `editor` reactive 对象附近新增:
```typescript
const isImageTool = computed(() =>
['text_to_image', 'image_to_image'].includes(editor.name)
);
const imageDefaults = reactive({
n: 1,
aspectRatio: '' as string,
watermark: false,
});
const imageConstraints = reactive({
maxN: 9,
maxWidth: 2048,
maxHeight: 2048,
});
```
- [ ] **Step 3: 在打开编辑抽屉时加载 extra 中的图片配置**
在打开编辑抽屉的逻辑中(`watch(editorVisible, ...)``openEditor` 函数),新增:
```typescript
const extra = row.extra as Record<string, any> ?? {};
if (extra.imageDefaults) {
Object.assign(imageDefaults, { n: 1, aspectRatio: '', watermark: false, ...extra.imageDefaults });
} else {
Object.assign(imageDefaults, { n: 1, aspectRatio: '', watermark: false });
}
if (extra.imageConstraints) {
Object.assign(imageConstraints, { maxN: 9, maxWidth: 2048, maxHeight: 2048, ...extra.imageConstraints });
} else {
Object.assign(imageConstraints, { maxN: 9, maxWidth: 2048, maxHeight: 2048 });
}
```
- [ ] **Step 4: 在保存时将图片配置写入 extra**
在保存编辑的逻辑中(`handleSave``handleUpdate` 函数),构造 extra 时新增:
```typescript
const extra: Record<string, unknown> = {
...(editor.governancePolicy?.allowInSubagent !== undefined ? { allowInSubagent: editor.governancePolicy.allowInSubagent } : {}),
...(editor.governancePolicy?.workerRoutingStrategy ? { workerRoutingStrategy: editor.governancePolicy.workerRoutingStrategy } : {}),
};
if (isImageTool.value) {
extra.imageDefaults = { ...imageDefaults };
extra.imageConstraints = { ...imageConstraints };
}
```
- [ ] **Step 5: 验证前端编译**
Run: `cd packages/frontend && npx vue-tsc --noEmit 2>&1 | head -20`
Expected: 无新增错误
- [ ] **Step 6: 提交**
```bash
git add packages/frontend/src/modules/agent/views/tools.vue
git commit -m "feat(frontend): add image generation config section in tool editor"
```
---
### Task 14: 端到端验证
- [ ] **Step 1: 启动后端**
Run: `cd packages/backend && npm run dev`
Expected: 启动成功,无报错
- [ ] **Step 2: 验证工具同步到数据库**
打开前端工具管理页,点击"同步工具目录",确认 `text_to_image``image_to_image` 出现在列表中toolset 为 `vision`capability 为 `multimodal`requiresModel 为 1。
- [ ] **Step 3: 配置模型渠道**
在工具编辑页为 `text_to_image` 配置模型渠道(选择已有的火山引擎或 MiniMax 渠道),选择对应的图片生成模型。
- [ ] **Step 4: 配置图片生成参数**
在工具编辑页的"图片生成配置"区块,设置默认值和硬上限,保存。
- [ ] **Step 5: 创建测试 Agent**
在 Agent 编辑页创建一个测试 Agent工具集启用 `text_to_image``image_to_image``image_recognize`
- [ ] **Step 6: 测试文生图**
在对话页向测试 Agent 发送"生成一张白底电商主图,蓝牙耳机",确认:
- Agent 调用 `text_to_image` 工具
- tool-card 渲染出生成的图片
- 图片 URL 是本地持久化 URL非临时 URL
- [ ] **Step 7: 测试图生图**
上传一张产品图片作为附件,发送"基于这张图生成一张白底主图",确认:
- 附件提示语包含 `image_to_image` 工具提示
- Agent 调用 `image_to_image` 工具referenceImage 为上传图片的 URL
- tool-card 渲染出生成的图片
- [ ] **Step 8: 测试多图**
发送"生成 3 张不同角度的产品图",确认:
- Agent 调用 `text_to_image` 时 n=3
- tool-card 以网格布局渲染 3 张图片
- 点击图片可预览大图
- [ ] **Step 9: 最终提交**
确认所有功能正常后,如有遗漏修复一并提交。