46 KiB
文生图与图生图工具 实施计划
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: 为 Neta Agent 新增 text_to_image 和 image_to_image 两个工具,建立可扩展的图片生成 Provider 层,前端支持图片结果渲染。
Architecture: 后端新增 image_providers/ 目录实现 Provider 策略模式(火山引擎 + MiniMax),两个工具通过 tool_resolver 注入凭证和 provider 实例。工具返回的图片 URL 经 ImageStorageService 转存本地后写入 session tree。前端 message-item.vue 升级图片渲染并新增多图网格分支。
Tech Stack: TypeScript, OpenAI SDK (火山引擎), fetch (MiniMax REST), TypeBox (参数 schema), Vue 3 + Element Plus (前端)
Spec: docs/superpowers/specs/2026-05-02-image-generation-tools-design.md
File Map
| 文件 | 职责 | 操作 |
|---|---|---|
packages/backend/src/modules/netaclaw/image_providers/types.ts |
Provider 统一接口、凭证、错误类型、工厂函数 | 新增 |
packages/backend/src/modules/netaclaw/image_providers/ark.ts |
火山引擎 Provider | 新增 |
packages/backend/src/modules/netaclaw/image_providers/minimax.ts |
MiniMax Provider | 新增 |
packages/backend/src/modules/netaclaw/tools/common.ts |
新增 ImageItem、images 类型、imagesResult()、toolResultToText 扩展 |
修改 |
packages/backend/src/modules/netaclaw/tools/manifest.ts |
ToolGovernanceExtra 新增 imageDefaults / imageConstraints |
修改 |
packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts |
图片工具共享辅助函数(clampDimension、persistImages、formatImageToolResult) | 新增 |
packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts |
文生图工具 | 新增 |
packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts |
图生图工具 | 新增 |
packages/backend/src/modules/netaclaw/tools/catalog.ts |
import 两个新工具触发注册 | 修改 |
packages/backend/src/modules/netaclaw/service/image_storage.ts |
图片转存本地服务 | 新增 |
packages/backend/src/modules/netaclaw/service/tool_registry.ts |
新增 getToolByName 方法 |
修改 |
packages/backend/src/modules/netaclaw/service/tool_resolver.ts |
注入凭证和 provider 实例 | 修改 |
packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts |
附件提示语扩展 | 修改 |
packages/backend/src/modules/netaclaw/runtime/agent.ts |
调用 buildLLMMessages 传 toolNames |
修改 |
packages/frontend/src/modules/agent/tools/renderer-registry.ts |
rawResult 新增 images 类型 |
修改 |
packages/frontend/src/modules/agent/components/message-item.vue |
单图升级为 el-image + 多图网格 | 修改 |
packages/frontend/src/modules/agent/views/tools.vue |
图片生成配置区块 | 修改 |
Task 1: 后端类型基础 — ToolResultContent 多图扩展
Files:
-
Modify:
packages/backend/src/modules/netaclaw/tools/common.ts -
Step 1: 在 ToolResultContent 类型前新增 ImageItem 接口
在 common.ts 的 ToolResultContent 类型定义之前(约第 26 行),新增:
export interface ImageItem {
url: string;
mimeType?: string;
width?: number;
height?: number;
seed?: number;
}
- Step 2: 扩展 ToolResultContent 联合类型
在现有 type: 'image' 分支之后,新增 images 分支:
| { type: 'images'; images: ImageItem[]; text?: string };
完整类型变为:
export type ToolResultContent =
| { type: 'text'; text: string }
| { type: 'json'; data: unknown }
| {
type: 'image';
url: string;
mimeType?: string;
text?: string;
width?: number;
height?: number;
bytes?: number;
originalWidth?: number;
originalHeight?: number;
originalBytes?: number;
resized?: boolean;
}
| { type: 'images'; images: ImageItem[]; text?: string };
- Step 3: 新增 imagesResult 辅助函数
在 imageResult() 函数之后新增:
export function imagesResult(
images: ImageItem[],
text?: string,
): ToolResultContent {
return { type: 'images', images, text };
}
- Step 4: 扩展 toolResultToText 函数
在 toolResultToText 函数中,if (value.type === 'image') 分支之后,新增:
if (value.type === 'images') {
const lines = (value as { type: 'images'; images: ImageItem[]; text?: string }).images.map((img, i) =>
`[图${i + 1}] ${img.url}${img.width && img.height ? ` (${img.width}x${img.height})` : ''}`
);
const header = (value as any).text || `已生成 ${lines.length} 张图片`;
return `${header}\n${lines.join('\n')}`;
}
- Step 5: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无新增错误
- Step 6: 提交
git add packages/backend/src/modules/netaclaw/tools/common.ts
git commit -m "feat(netaclaw): extend ToolResultContent with images type for multi-image tool results"
Task 2: ToolGovernanceExtra 扩展 — imageDefaults / imageConstraints
Files:
-
Modify:
packages/backend/src/modules/netaclaw/tools/manifest.ts -
Step 1: 扩展 ToolGovernanceExtra 类型
在 manifest.ts 第 9-12 行的 ToolGovernanceExtra 类型中,新增图片工具字段:
export type ToolGovernanceExtra = {
allowInSubagent?: boolean;
workerRoutingStrategy?: ToolWorkerRoutingStrategy;
imageDefaults?: {
n?: number;
aspectRatio?: string;
width?: number;
height?: number;
watermark?: boolean;
responseFormat?: 'url' | 'base64';
};
imageConstraints?: {
maxN?: number;
maxWidth?: number;
maxHeight?: number;
};
};
- Step 2: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无新增错误(现有代码只读取 allowInSubagent 和 workerRoutingStrategy,新增字段不影响)
- Step 3: 提交
git add packages/backend/src/modules/netaclaw/tools/manifest.ts
git commit -m "feat(netaclaw): add imageDefaults and imageConstraints to ToolGovernanceExtra"
Task 3: Provider 层 — 统一接口与工厂
Files:
-
Create:
packages/backend/src/modules/netaclaw/image_providers/types.ts -
Step 1: 创建 image_providers 目录和 types.ts
import type { ToolGovernanceExtra } from '../tools/manifest.js';
export interface ImageProviderCredentials {
baseUrl: string;
apiKey: string;
supplier: string;
modelId: string;
promptHint: string | null;
extra?: ToolGovernanceExtra | null;
}
export interface TextToImageParams {
prompt: string;
width?: number;
height?: number;
aspectRatio?: string;
n?: number;
responseFormat?: 'url' | 'base64';
watermark?: boolean;
seed?: number;
extra?: Record<string, unknown>;
}
export interface ImageToImageParams extends TextToImageParams {
referenceImage: string;
strength?: number;
}
export interface ImageGenerationResult {
images: { url?: string; base64?: string; width?: number; height?: number }[];
model: string;
provider: string;
}
export interface ImageGenerationProvider {
readonly id: string;
textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult>;
imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult>;
}
export type ImageGenerationErrorCode =
| 'content_safety'
| 'rate_limit'
| 'insufficient_balance'
| 'invalid_params'
| 'timeout'
| 'network'
| 'unknown';
export class ImageGenerationError extends Error {
constructor(
message: string,
public readonly code: ImageGenerationErrorCode,
public readonly retryable: boolean,
) {
super(message);
this.name = 'ImageGenerationError';
}
}
const providers = new Map<string, ImageGenerationProvider>();
export function registerImageProvider(provider: ImageGenerationProvider): void {
providers.set(provider.id, provider);
}
export function getImageProvider(supplier: string, baseUrl: string): ImageGenerationProvider | null {
const s = supplier.toLowerCase();
if (s === 'minimax') return providers.get('minimax') ?? null;
if (s === 'ark' || s === 'volcengine') return providers.get('ark') ?? null;
if (s === 'openai') {
if (baseUrl.includes('volces.com') || baseUrl.includes('volcengine')) return providers.get('ark') ?? null;
if (baseUrl.includes('minimax')) return providers.get('minimax') ?? null;
}
return null;
}
- Step 2: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 3: 提交
git add packages/backend/src/modules/netaclaw/image_providers/
git commit -m "feat(netaclaw): add image provider types, error class, and factory"
Task 4: Provider 层 — 火山引擎 (Ark)
Files:
-
Create:
packages/backend/src/modules/netaclaw/image_providers/ark.ts -
Step 1: 实现 ArkImageProvider
import OpenAI from 'openai';
import {
type ImageGenerationProvider,
type ImageProviderCredentials,
type TextToImageParams,
type ImageToImageParams,
type ImageGenerationResult,
ImageGenerationError,
registerImageProvider,
} from './types.js';
function resolveSize(params: TextToImageParams): string | undefined {
if (params.aspectRatio) {
const map: Record<string, string> = {
'1:1': '1024x1024',
'16:9': '1280x720',
'4:3': '1152x864',
'3:2': '1248x832',
'2:3': '832x1248',
'3:4': '864x1152',
'9:16': '720x1280',
};
return map[params.aspectRatio] ?? '1024x1024';
}
if (params.width && params.height) {
return `${params.width}x${params.height}`;
}
return undefined;
}
function normalizeResult(response: OpenAI.Images.ImagesResponse, creds: ImageProviderCredentials): ImageGenerationResult {
return {
images: (response.data ?? []).map(item => ({
url: item.url,
base64: item.b64_json,
})),
model: creds.modelId,
provider: 'ark',
};
}
class ArkImageProvider implements ImageGenerationProvider {
readonly id = 'ark';
async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 });
try {
const response = await client.images.generate({
model: creds.modelId,
prompt: params.prompt,
size: resolveSize(params) as any,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
...(params.extra || params.watermark !== undefined
? {
extra_body: {
...(params.watermark !== undefined ? { watermark: params.watermark } : {}),
...params.extra,
},
}
: {}),
});
return normalizeResult(response, creds);
} catch (err: any) {
throw this.wrapError(err);
}
}
async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const client = new OpenAI({ apiKey: creds.apiKey, baseURL: creds.baseUrl, timeout: 60_000 });
try {
const response = await client.images.generate({
model: creds.modelId,
prompt: params.prompt,
size: resolveSize(params) as any,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
extra_body: {
image: params.referenceImage,
...(params.strength !== undefined ? { strength: params.strength } : {}),
...(params.watermark !== undefined ? { watermark: params.watermark } : {}),
...params.extra,
},
} as any);
return normalizeResult(response, creds);
} catch (err: any) {
throw this.wrapError(err);
}
}
private wrapError(err: any): ImageGenerationError {
const status = err?.status ?? err?.response?.status;
const msg = err?.message ?? String(err);
if (status === 429) return new ImageGenerationError(msg, 'rate_limit', true);
if (status === 402) return new ImageGenerationError(msg, 'insufficient_balance', false);
if (status === 400) {
if (msg.includes('safety') || msg.includes('sensitive') || msg.includes('安全'))
return new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false);
return new ImageGenerationError(msg, 'invalid_params', false);
}
if (err?.code === 'ETIMEDOUT' || err?.code === 'ECONNABORTED')
return new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true);
return new ImageGenerationError(msg, 'unknown', false);
}
}
registerImageProvider(new ArkImageProvider());
- Step 2: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 3: 提交
git add packages/backend/src/modules/netaclaw/image_providers/ark.ts
git commit -m "feat(netaclaw): add Ark (Volcano Engine) image provider via OpenAI SDK"
Task 5: Provider 层 — MiniMax
Files:
-
Create:
packages/backend/src/modules/netaclaw/image_providers/minimax.ts -
Step 1: 实现 MiniMaxImageProvider
import {
type ImageGenerationProvider,
type ImageProviderCredentials,
type TextToImageParams,
type ImageToImageParams,
type ImageGenerationResult,
ImageGenerationError,
registerImageProvider,
} from './types.js';
interface MiniMaxResponse {
id?: string;
data?: { image_urls?: string[]; image_base64?: string[] };
metadata?: { success_count?: number; failed_count?: number };
base_resp?: { status_code?: number; status_msg?: string };
}
function buildBaseBody(params: TextToImageParams, creds: ImageProviderCredentials): Record<string, unknown> {
const body: Record<string, unknown> = {
model: creds.modelId,
prompt: params.prompt,
n: params.n ?? 1,
response_format: params.responseFormat ?? 'url',
};
if (params.watermark !== undefined) body.aigc_watermark = params.watermark;
if (params.aspectRatio) {
body.aspect_ratio = params.aspectRatio;
} else if (params.width && params.height) {
body.width = params.width;
body.height = params.height;
}
if (params.seed !== undefined) body.seed = params.seed;
if (params.extra?.style) body.style = params.extra.style;
if (params.extra?.prompt_optimizer !== undefined) body.prompt_optimizer = params.extra.prompt_optimizer;
return body;
}
function normalizeResponse(json: MiniMaxResponse, creds: ImageProviderCredentials, format: string): ImageGenerationResult {
const resp = json.base_resp;
if (resp && resp.status_code !== 0) {
const code = resp.status_code;
if (code === 1026) throw new ImageGenerationError('提示词触发内容安全策略,请调整描述', 'content_safety', false);
if (code === 1002) throw new ImageGenerationError('当前请求过多,请稍后重试', 'rate_limit', true);
if (code === 1008) throw new ImageGenerationError('模型渠道余额不足', 'insufficient_balance', false);
if (code === 1004) throw new ImageGenerationError('API Key 鉴权失败', 'invalid_params', false);
throw new ImageGenerationError(resp.status_msg ?? `MiniMax error ${code}`, 'unknown', false);
}
const urls = json.data?.image_urls ?? [];
const b64s = json.data?.image_base64 ?? [];
const images = format === 'base64'
? b64s.map(b => ({ base64: b }))
: urls.map(u => ({ url: u }));
return { images, model: creds.modelId, provider: 'minimax' };
}
class MiniMaxImageProvider implements ImageGenerationProvider {
readonly id = 'minimax';
async textToImage(params: TextToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const body = buildBaseBody(params, creds);
return this.request(body, creds, params.responseFormat ?? 'url');
}
async imageToImage(params: ImageToImageParams, creds: ImageProviderCredentials): Promise<ImageGenerationResult> {
const body = buildBaseBody(params, creds);
body.subject_reference = [{ image: params.referenceImage }];
if (params.strength !== undefined) body.strength = params.strength;
return this.request(body, creds, params.responseFormat ?? 'url');
}
private async request(body: Record<string, unknown>, creds: ImageProviderCredentials, format: string): Promise<ImageGenerationResult> {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 60_000);
try {
const baseUrl = creds.baseUrl.replace(/\/+$/, '');
const res = await fetch(`${baseUrl}/v1/image_generation`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${creds.apiKey}`,
},
body: JSON.stringify(body),
signal: controller.signal,
});
if (!res.ok) {
const text = await res.text().catch(() => '');
throw new ImageGenerationError(`MiniMax HTTP ${res.status}: ${text}`, res.status === 429 ? 'rate_limit' : 'unknown', res.status === 429);
}
const json: MiniMaxResponse = await res.json();
return normalizeResponse(json, creds, format);
} catch (err: any) {
if (err instanceof ImageGenerationError) throw err;
if (err?.name === 'AbortError') throw new ImageGenerationError('生成超时,可尝试降低图片尺寸', 'timeout', true);
throw new ImageGenerationError(err?.message ?? String(err), 'network', true);
} finally {
clearTimeout(timer);
}
}
}
registerImageProvider(new MiniMaxImageProvider());
- Step 2: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 3: 提交
git add packages/backend/src/modules/netaclaw/image_providers/minimax.ts
git commit -m "feat(netaclaw): add MiniMax image provider via REST API"
Task 6: 图片转存服务
Files:
-
Create:
packages/backend/src/modules/netaclaw/service/image_storage.ts -
Step 1: 实现 ImageStorageService
复用现有 pluginService.getInstance('upload') 的 downAndUpload 方法:
import { Inject, Provide, Scope, ScopeEnum } from '@midwayjs/core';
import { PluginService } from '../../plugin/service/info.js';
import { randomUUID } from 'crypto';
@Provide()
@Scope(ScopeEnum.Singleton)
export class ImageStorageService {
@Inject()
pluginService: PluginService;
async persist(tempUrl: string): Promise<string> {
const upload = await this.pluginService.getInstance('upload');
const ext = this.detectExtension(tempUrl);
const filename = `img-${Date.now()}-${randomUUID().slice(0, 8)}${ext}`;
return upload.downAndUpload(tempUrl, filename);
}
async persistAll(urls: string[]): Promise<string[]> {
return Promise.all(urls.map(url => this.persist(url)));
}
private detectExtension(url: string): string {
const pathname = url.split('?')[0];
const match = pathname.match(/\.(png|jpg|jpeg|webp|gif)$/i);
return match ? `.${match[1].toLowerCase()}` : '.png';
}
}
- Step 2: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 3: 提交
git add packages/backend/src/modules/netaclaw/service/image_storage.ts
git commit -m "feat(netaclaw): add ImageStorageService for persisting generated images"
Task 7: 图片工具共享函数
Files:
-
Create:
packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts -
Step 1: 创建 image_common.ts
提取 text_to_image 和 image_to_image 共用的辅助函数:
import { imageResult, imagesResult } from '../common.js';
import type { ImageGenerationResult } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
export function clampDimension(value: number | undefined, max: number): number | undefined {
if (value === undefined) return undefined;
return Math.min(value, max);
}
export async function persistImages(result: ImageGenerationResult, storage: ImageStorageService): Promise<ImageGenerationResult> {
const persisted = await Promise.all(
result.images.map(async img => {
if (!img.url) return img;
const permanentUrl = await storage.persist(img.url);
return { ...img, url: permanentUrl };
})
);
return { ...result, images: persisted };
}
export function formatImageToolResult(result: ImageGenerationResult) {
if (result.images.length === 1) {
const img = result.images[0];
return imageResult(img.url!, undefined, {
width: img.width,
height: img.height,
text: `图片已生成 (${result.provider}/${result.model})`,
});
}
return imagesResult(
result.images.map(img => ({ url: img.url!, width: img.width, height: img.height })),
`已生成 ${result.images.length} 张图片 (${result.provider}/${result.model})`,
);
}
- Step 2: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 3: 提交
git add packages/backend/src/modules/netaclaw/tools/builtin/image_common.ts
git commit -m "feat(netaclaw): extract shared image tool helpers"
Task 8: text_to_image 工具
Files:
-
Create:
packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts -
Modify:
packages/backend/src/modules/netaclaw/tools/catalog.ts -
Step 1: 创建 text_to_image.ts
import { Type, Static } from '@sinclair/typebox';
import { type AnyAgentTool } from '../common.js';
import { registerSchema } from '../catalog.js';
import type {
ImageGenerationProvider,
ImageProviderCredentials,
} from '../../image_providers/types.js';
import { ImageGenerationError } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
import { clampDimension, persistImages, formatImageToolResult } from './image_common.js';
const Params = Type.Object({
prompt: Type.String({ description: '图片描述,尽量详细具体' }),
aspectRatio: Type.Optional(Type.String({
description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16',
})),
width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })),
height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })),
n: Type.Optional(Type.Integer({ description: '生成数量,默认 1,最大 9', minimum: 1, maximum: 9 })),
watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })),
seed: Type.Optional(Type.Integer({ description: '随机种子,相同 seed 可复现相近结果' })),
extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), {
description: 'Provider 特有参数,如 MiniMax 的 style、prompt_optimizer',
})),
});
export function createTextToImageTool(
creds: ImageProviderCredentials,
provider: ImageGenerationProvider,
storage: ImageStorageService,
): AnyAgentTool {
const defaults = creds.extra?.imageDefaults ?? {};
const constraints = creds.extra?.imageConstraints ?? {};
return {
name: 'text_to_image',
label: '文生图',
description: creds.promptHint
? `根据文字描述生成图片。\n${creds.promptHint}`
: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。',
parameters: Params,
async execute(_id: string, params: Static<typeof Params>) {
const merged = {
prompt: params.prompt,
n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9),
aspectRatio: params.aspectRatio ?? defaults.aspectRatio,
width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048),
height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048),
watermark: params.watermark ?? defaults.watermark ?? false,
seed: params.seed,
responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64',
extra: params.extra,
};
try {
let result = await provider.textToImage(merged, creds);
result = await persistImages(result, storage);
return formatImageToolResult(result);
} catch (err) {
if (err instanceof ImageGenerationError) {
const prefix = err.retryable ? '[可重试] ' : '';
return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` };
}
return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` };
}
},
};
}
registerSchema({
name: 'text_to_image',
toolset: 'vision',
description: '根据文字描述生成图片,支持指定尺寸、数量、风格等参数。',
capability: 'multimodal',
visibility: 'tool',
isCore: false,
canDisable: true,
supportsPromptHint: true,
requiresModel: true,
});
- Step 2: 在 catalog.ts 注册
在 catalog.ts 末尾的 import 列表中(约第 66 行 import './builtin/execute_skill.js'; 之后),新增:
import './builtin/text_to_image.js';
- Step 3: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 4: 提交
git add packages/backend/src/modules/netaclaw/tools/builtin/text_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts
git commit -m "feat(netaclaw): add text_to_image tool with defaults/constraints merge and image persistence"
Task 9: image_to_image 工具
Files:
-
Create:
packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts -
Modify:
packages/backend/src/modules/netaclaw/tools/catalog.ts -
Step 1: 创建 image_to_image.ts
import { Type, Static } from '@sinclair/typebox';
import { type AnyAgentTool } from '../common.js';
import { registerSchema } from '../catalog.js';
import type {
ImageGenerationProvider,
ImageProviderCredentials,
} from '../../image_providers/types.js';
import { ImageGenerationError } from '../../image_providers/types.js';
import type { ImageStorageService } from '../../service/image_storage.js';
import { clampDimension, persistImages, formatImageToolResult } from './image_common.js';
const Params = Type.Object({
prompt: Type.String({ description: '对参考图的修改描述' }),
referenceImage: Type.String({ description: '参考图片 URL(从用户上传附件获取)' }),
strength: Type.Optional(Type.Number({
description: '参考图影响强度 0-1,越大越接近原图', minimum: 0, maximum: 1,
})),
aspectRatio: Type.Optional(Type.String({
description: '宽高比。可选: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16',
})),
width: Type.Optional(Type.Integer({ description: '精确宽度(像素),优先级低于 aspectRatio' })),
height: Type.Optional(Type.Integer({ description: '精确高度(像素)' })),
n: Type.Optional(Type.Integer({ description: '生成数量,默认 1,最大 9', minimum: 1, maximum: 9 })),
watermark: Type.Optional(Type.Boolean({ description: '是否添加水印' })),
seed: Type.Optional(Type.Integer({ description: '随机种子' })),
extra: Type.Optional(Type.Record(Type.String(), Type.Unknown(), {
description: 'Provider 特有参数',
})),
});
export function createImageToImageTool(
creds: ImageProviderCredentials,
provider: ImageGenerationProvider,
storage: ImageStorageService,
): AnyAgentTool {
const defaults = creds.extra?.imageDefaults ?? {};
const constraints = creds.extra?.imageConstraints ?? {};
return {
name: 'image_to_image',
label: '图生图',
description: creds.promptHint
? `基于参考图片生成新图片。\n${creds.promptHint}`
: '基于参考图片生成新图片,支持风格迁移、内容编辑等。传入参考图URL和修改描述。',
parameters: Params,
async execute(_id: string, params: Static<typeof Params>) {
const merged = {
prompt: params.prompt,
referenceImage: params.referenceImage,
strength: params.strength,
n: Math.min(params.n ?? defaults.n ?? 1, constraints.maxN ?? 9),
aspectRatio: params.aspectRatio ?? defaults.aspectRatio,
width: clampDimension(params.width ?? defaults.width, constraints.maxWidth ?? 2048),
height: clampDimension(params.height ?? defaults.height, constraints.maxHeight ?? 2048),
watermark: params.watermark ?? defaults.watermark ?? false,
seed: params.seed,
responseFormat: (defaults.responseFormat ?? 'url') as 'url' | 'base64',
extra: params.extra,
};
try {
let result = await provider.imageToImage(merged, creds);
result = await persistImages(result, storage);
return formatImageToolResult(result);
} catch (err) {
if (err instanceof ImageGenerationError) {
const prefix = err.retryable ? '[可重试] ' : '';
return { type: 'text' as const, text: `${prefix}图片生成失败: ${err.message}` };
}
return { type: 'text' as const, text: `图片生成失败: ${err instanceof Error ? err.message : String(err)}` };
}
},
};
}
registerSchema({
name: 'image_to_image',
toolset: 'vision',
description: '基于参考图片生成新图片,支持风格迁移、内容编辑等。传入参考图URL和修改描述。',
capability: 'multimodal',
visibility: 'tool',
isCore: false,
canDisable: true,
supportsPromptHint: true,
requiresModel: true,
});
- Step 2: 在 catalog.ts 注册
在 catalog.ts 末尾 import './builtin/text_to_image.js'; 之后新增:
import './builtin/image_to_image.js';
- Step 3: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 4: 提交
git add packages/backend/src/modules/netaclaw/tools/builtin/image_to_image.ts packages/backend/src/modules/netaclaw/tools/catalog.ts
git commit -m "feat(netaclaw): add image_to_image tool with reference image support"
Task 10: tool_resolver 集成
Files:
-
Modify:
packages/backend/src/modules/netaclaw/service/tool_resolver.ts -
Modify:
packages/backend/src/modules/netaclaw/service/tool_registry.ts -
Step 1: 在 tool_registry.ts 新增 getToolByName 方法
在 NetaClawToolRegistryService 类中(getToolModelConfig 方法附近),新增:
async getToolByName(name: string): Promise<NetaClawToolEntity | null> {
return this.toolRepo.findOneBy({ name });
}
- Step 2: 在 tool_resolver.ts 添加 import
在 tool_resolver.ts 顶部 import 区域(约第 32 行 import { createImageRecognizeTool } 附近),新增:
import { createTextToImageTool } from '../tools/builtin/text_to_image.js';
import { createImageToImageTool } from '../tools/builtin/image_to_image.js';
import { getImageProvider, type ImageProviderCredentials } from '../image_providers/types.js';
import '../image_providers/ark.js';
import '../image_providers/minimax.js';
import { ImageStorageService } from './image_storage.js';
- Step 3: 注入 ImageStorageService
在 NetaClawToolResolverService 类中,已有的 @Inject() 区域新增:
@Inject()
imageStorageService: ImageStorageService;
- Step 4: 在 resolve 方法中注入两个工具
在 tool_resolver.ts 的 resolve 方法中,找到 image_recognize 的注入块(约第 647-664 行),在其 } 之后新增:
for (const imgToolName of ['text_to_image', 'image_to_image'] as const) {
if (filteredNames.includes(imgToolName)) {
const toolModelConfig = await this.toolRegistry.getToolModelConfig(imgToolName);
if (toolModelConfig) {
const channelCreds = await this.modelChannelService.resolveForAgent(toolModelConfig.modelChannelId, toolModelConfig.modelId);
if (channelCreds) {
const provider = getImageProvider(channelCreds.channelSupplier, channelCreds.baseUrl ?? '');
if (provider) {
const toolEntity = await this.toolRegistry.getToolByName(imgToolName);
const extra = toolEntity?.extra as import('../tools/manifest.js').ToolGovernanceExtra | null;
const creds: ImageProviderCredentials = {
baseUrl: channelCreds.baseUrl ?? '',
apiKey: channelCreds.apiKey,
supplier: channelCreds.channelSupplier,
modelId: toolModelConfig.modelId,
promptHint: toolModelConfig.promptHint,
extra,
};
if (imgToolName === 'text_to_image') {
runtimeTools.push(createTextToImageTool(creds, provider, this.imageStorageService));
} else {
runtimeTools.push(createImageToImageTool(creds, provider, this.imageStorageService));
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'image_provider_not_found' });
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'model_channel_unavailable' });
}
} else {
disabledReasons.push({ name: imgToolName, reason: 'model_not_configured' });
}
}
}
- Step 5: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 6: 提交
git add packages/backend/src/modules/netaclaw/service/tool_resolver.ts packages/backend/src/modules/netaclaw/service/tool_registry.ts
git commit -m "feat(netaclaw): integrate text_to_image and image_to_image into tool_resolver"
Task 11: Prompt Builder 附件提示语扩展
Files:
-
Modify:
packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts -
Modify:
packages/backend/src/modules/netaclaw/runtime/agent.ts -
Step 1: 修改 buildLLMMessages 签名
在 prompt_builder.ts 第 141 行,给 buildLLMMessages 新增 toolNames 参数:
export function buildLLMMessages(
systemPrompt: string,
history: LLMMessage[],
userMessage: UserMessageInput,
toolNames?: string[],
): LLMMessage[] {
- Step 2: 修改附件提示语生成逻辑
替换第 152-162 行的附件处理块:
if (userMessage.metadata?.attachments && (userMessage.metadata.attachments as unknown[]).length) {
const attachments = userMessage.metadata.attachments as ChatAttachment[];
const desc = attachments.map(a => {
const typeLabel = ({ image: '图片', video: '视频', pdf: 'PDF', document: '文件', other: '文件' } as Record<string, string>)[a.type];
return `- ${typeLabel}: ${a.name} (URL: ${a.url})`;
}).join('\n');
const hints: string[] = [];
const names = new Set(toolNames ?? []);
if (names.has('image_recognize')) {
hints.push('如需分析图片内容,请使用 image_recognize 工具,传入图片 URL');
}
if (names.has('image_to_image')) {
hints.push('如需基于图片生成新图片,请使用 image_to_image 工具,将图片 URL 作为 referenceImage 参数');
}
if (hints.length === 0) {
hints.push('附件已上传,可在需要时引用其 URL');
}
messages.push({
role: 'user',
content: `[系统提示] 用户上传了以下附件:\n${desc}\n${hints.join('。')}。`,
});
}
- Step 3: 修改 agent.ts 调用点
在 agent.ts 第 96-100 行,给 buildLLMMessages 传入 toolNames:
const messages: LLMMessage[] = buildLLMMessages(
agentConfig.systemPrompt,
history,
{ content: userMessage, metadata: params.userMessageMetadata },
params.toolNames || tools.map(tool => tool.name),
);
- Step 4: 验证编译
Run: cd packages/backend && npx tsc --noEmit --pretty 2>&1 | head -20
Expected: 无错误
- Step 5: 提交
git add packages/backend/src/modules/netaclaw/runtime/prompt_builder.ts packages/backend/src/modules/netaclaw/runtime/agent.ts
git commit -m "feat(netaclaw): dynamic attachment hints based on available tools"
Task 12: 前端 — renderer-registry 和 message-item 图片渲染
Files:
-
Modify:
packages/frontend/src/modules/agent/tools/renderer-registry.ts -
Modify:
packages/frontend/src/modules/agent/components/message-item.vue -
Step 1: 扩展 renderer-registry 的 rawResult 类型
在 renderer-registry.ts 第 9 行 ToolRenderSource 接口的 rawResult 字段中,扩展 type 联合类型和新增 images 字段:
rawResult?: {
type: 'text' | 'json' | 'image' | 'images';
text?: string;
data?: unknown;
url?: string;
mimeType?: string;
width?: number;
height?: number;
bytes?: number;
originalWidth?: number;
originalHeight?: number;
originalBytes?: number;
resized?: boolean;
images?: {
url: string;
mimeType?: string;
width?: number;
height?: number;
seed?: number;
}[];
};
- Step 2: 修改 message-item.vue 模板 — 升级单图渲染并新增多图
找到 message-item.vue 第 75-83 行现有的单图渲染块:
<div
v-if="tool.rawResult?.type === 'image' && tool.rawResult.url"
class="tool-execution__image"
>
<img :src="tool.rawResult.url" @click="openToolImage(tool.rawResult.url)" />
<div class="tool-execution__image-caption">
{{ formatToolImageCaption(tool.rawResult) }}
</div>
</div>
替换为:
<div
v-if="tool.rawResult?.type === 'image' && tool.rawResult.url"
class="tool-execution__image"
>
<el-image
:src="tool.rawResult.url"
fit="contain"
:preview-src-list="[tool.rawResult.url]"
preview-teleported
class="tool-execution__image-single"
/>
<div class="tool-execution__image-caption">
{{ formatToolImageCaption(tool.rawResult) }}
</div>
</div>
<div
v-else-if="tool.rawResult?.type === 'images' && tool.rawResult.images?.length"
class="tool-execution__image"
>
<div v-if="tool.rawResult.text" class="tool-execution__image-caption">
{{ tool.rawResult.text }}
</div>
<div class="tool-execution__image-grid">
<el-image
v-for="(img, idx) in tool.rawResult.images"
:key="idx"
:src="img.url"
fit="cover"
:preview-src-list="tool.rawResult.images.map(i => i.url)"
:initial-index="idx"
preview-teleported
class="tool-execution__image-grid-item"
/>
</div>
</div>
- Step 3: 新增多图网格样式
在 message-item.vue 的 <style> 块中,找到现有的 .tool-execution__image 样式附近,新增:
.tool-execution__image-single {
max-width: 360px;
border-radius: 8px;
}
.tool-execution__image-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(140px, 1fr));
gap: 8px;
margin-top: 8px;
}
.tool-execution__image-grid-item {
width: 100%;
aspect-ratio: 1;
border-radius: 8px;
cursor: pointer;
object-fit: cover;
}
- Step 4: 验证前端编译
Run: cd packages/frontend && npx vue-tsc --noEmit 2>&1 | head -20
Expected: 无新增错误
- Step 5: 提交
git add packages/frontend/src/modules/agent/tools/renderer-registry.ts packages/frontend/src/modules/agent/components/message-item.vue
git commit -m "feat(frontend): upgrade image rendering and add multi-image grid in message-item"
Task 13: 前端 — 工具编辑页图片生成配置区块
Files:
-
Modify:
packages/frontend/src/modules/agent/views/tools.vue -
Step 1: 在编辑抽屉中新增图片生成配置区块
在 tools.vue 的编辑抽屉中,找到模型配置区域(约第 475-500 行 <template v-if="editor.requiresModel === 1"> 块的 </template> 之后),新增:
<template v-if="isImageTool">
<el-divider>{{ t('图片生成配置') }}</el-divider>
<el-alert type="info" :closable="false" style="margin-bottom: 16px">
默认值在 Agent 未指定时生效,Agent 可根据用户指令覆盖。硬上限不可突破。
</el-alert>
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="默认数量">
<el-input-number v-model="imageDefaults.n" :min="1" :max="9" :step="1" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="默认比例">
<el-select v-model="imageDefaults.aspectRatio" clearable placeholder="不限" style="width: 100%">
<el-option label="1:1" value="1:1" />
<el-option label="16:9" value="16:9" />
<el-option label="4:3" value="4:3" />
<el-option label="3:2" value="3:2" />
<el-option label="2:3" value="2:3" />
<el-option label="3:4" value="3:4" />
<el-option label="9:16" value="9:16" />
</el-select>
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="默认水印">
<el-switch v-model="imageDefaults.watermark" />
</el-form-item>
</el-col>
</el-row>
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="最大数量">
<el-input-number v-model="imageConstraints.maxN" :min="1" :max="9" :step="1" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="最大宽度">
<el-input-number v-model="imageConstraints.maxWidth" :min="512" :max="4096" :step="64" style="width: 100%" />
</el-form-item>
</el-col>
<el-col :span="8">
<el-form-item label="最大高度">
<el-input-number v-model="imageConstraints.maxHeight" :min="512" :max="4096" :step="64" style="width: 100%" />
</el-form-item>
</el-col>
</el-row>
</template>
- Step 2: 新增响应式数据和计算属性
在 <script> 区域的 editor reactive 对象附近新增:
const isImageTool = computed(() =>
['text_to_image', 'image_to_image'].includes(editor.name)
);
const imageDefaults = reactive({
n: 1,
aspectRatio: '' as string,
watermark: false,
});
const imageConstraints = reactive({
maxN: 9,
maxWidth: 2048,
maxHeight: 2048,
});
- Step 3: 在打开编辑抽屉时加载 extra 中的图片配置
在打开编辑抽屉的逻辑中(watch(editorVisible, ...) 或 openEditor 函数),新增:
const extra = row.extra as Record<string, any> ?? {};
if (extra.imageDefaults) {
Object.assign(imageDefaults, { n: 1, aspectRatio: '', watermark: false, ...extra.imageDefaults });
} else {
Object.assign(imageDefaults, { n: 1, aspectRatio: '', watermark: false });
}
if (extra.imageConstraints) {
Object.assign(imageConstraints, { maxN: 9, maxWidth: 2048, maxHeight: 2048, ...extra.imageConstraints });
} else {
Object.assign(imageConstraints, { maxN: 9, maxWidth: 2048, maxHeight: 2048 });
}
- Step 4: 在保存时将图片配置写入 extra
在保存编辑的逻辑中(handleSave 或 handleUpdate 函数),构造 extra 时新增:
const extra: Record<string, unknown> = {
...(editor.governancePolicy?.allowInSubagent !== undefined ? { allowInSubagent: editor.governancePolicy.allowInSubagent } : {}),
...(editor.governancePolicy?.workerRoutingStrategy ? { workerRoutingStrategy: editor.governancePolicy.workerRoutingStrategy } : {}),
};
if (isImageTool.value) {
extra.imageDefaults = { ...imageDefaults };
extra.imageConstraints = { ...imageConstraints };
}
- Step 5: 验证前端编译
Run: cd packages/frontend && npx vue-tsc --noEmit 2>&1 | head -20
Expected: 无新增错误
- Step 6: 提交
git add packages/frontend/src/modules/agent/views/tools.vue
git commit -m "feat(frontend): add image generation config section in tool editor"
Task 14: 端到端验证
- Step 1: 启动后端
Run: cd packages/backend && npm run dev
Expected: 启动成功,无报错
- Step 2: 验证工具同步到数据库
打开前端工具管理页,点击"同步工具目录",确认 text_to_image 和 image_to_image 出现在列表中,toolset 为 vision,capability 为 multimodal,requiresModel 为 1。
- Step 3: 配置模型渠道
在工具编辑页为 text_to_image 配置模型渠道(选择已有的火山引擎或 MiniMax 渠道),选择对应的图片生成模型。
- Step 4: 配置图片生成参数
在工具编辑页的"图片生成配置"区块,设置默认值和硬上限,保存。
- Step 5: 创建测试 Agent
在 Agent 编辑页创建一个测试 Agent,工具集启用 text_to_image、image_to_image、image_recognize。
- Step 6: 测试文生图
在对话页向测试 Agent 发送"生成一张白底电商主图,蓝牙耳机",确认:
-
Agent 调用
text_to_image工具 -
tool-card 渲染出生成的图片
-
图片 URL 是本地持久化 URL(非临时 URL)
-
Step 7: 测试图生图
上传一张产品图片作为附件,发送"基于这张图生成一张白底主图",确认:
-
附件提示语包含
image_to_image工具提示 -
Agent 调用
image_to_image工具,referenceImage 为上传图片的 URL -
tool-card 渲染出生成的图片
-
Step 8: 测试多图
发送"生成 3 张不同角度的产品图",确认:
-
Agent 调用
text_to_image时 n=3 -
tool-card 以网格布局渲染 3 张图片
-
点击图片可预览大图
-
Step 9: 最终提交
确认所有功能正常后,如有遗漏修复一并提交。