[LangChain语言模型组件的设计与实现]多形态的消息内容——多模态AI解决方案的基础

作为消息的基类，`BaseMessage`利用其`content`字段存储原始的内容，它可以是一个字符串或者字典列表。原始的内容会转换成一个`ContentBlock`列表通过`content_blocks`的属性返回。作为消息的主体内容，它们可以是一段单纯的字符串文本，也可以一段多媒体内容（比如图片、音频和视频）或者一个二进制文件，不同的内容形态对应着相应的ContentBlock类型

JaydenAI

1250人浏览 · 2026-03-01 08:22:52

JaydenAI · 2026-03-01 08:22:52 发布

作为消息的基类，BaseMessage利用其content字段存储原始的内容，它可以是一个字符串或者字典列表。原始的内容会转换成一个ContentBlock列表通过content_blocks的属性返回。作为消息的主体内容，它们可以是一段单纯的字符串文本，也可以一段多媒体内容（比如图片、音频和视频）或者一个二进制文件，不同的内容形态对应着相应的ContentBlock类型，这些类型之间的关系体现在如下这个UML类图中（框起来的部分）

Alternative Text

ContentBlock并不是一个基类，而是针对六个类型的联合，它们仅仅是单纯的类型字典。这些类型具有一些相同的数据成员，比如表示专属类型的type字段，作为唯一标识的id字段，表示当偏移位置的index字段和一个存放额外数据的extras字段。

class BaseMessage(Serializable):
    content: str | list[str | dict]
    @property
    def content_blocks(self) -> list[types.ContentBlock]

ContentBlock = (
    TextContentBlock
    | InvalidToolCall
    | ReasoningContentBlock
    | NonStandardContentBlock
    | DataContentBlock
    | ToolContentBlock
)

1. TextContentBlock

TextContentBlock的荷载内容是一个单纯的字符串文本。它专属的类型为“text”，作为主体内容的文本存储于text字段中。

class TextContentBlock(TypedDict):
    type: Literal["text"]
    id: NotRequired[str]
    text: str
    annotations: NotRequired[list[Annotation]]
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

它的annotations字段返回一个表示元数据标注Annotation列表。Annotation是针对Citation和NonStandardAnnotation的联合类型。在LangChain的多模态和RAG体系中，表示“引文”的Citation是TextContentBlock中最重要的标注类型，它构建了模型回答与原始数据源之间的引用关系。当模型基于外部文档（如 PDF、网页、数据库）生成回答时，它会在文本中插入引用标准，并在消息的annotations字段中提供该引用的详细元数据。

Annotation = Citation | NonStandardAnnotation

class Citation(TypedDict):
    type: Literal["citation"]
    id: NotRequired[str]
    url: NotRequired[str]
    title: NotRequired[str]
    start_index: NotRequired[int]
    end_index: NotRequired[int]
    cited_text: NotRequired[str]
    extras: NotRequired[dict[str, Any]]

Citation同样具有专属的类型“citation”，其url、title、start_index、end_index和cited_text分别表示引用的地址、标题、起止位置和引用文本。除了这种“标准”的基于引用的标注之外，其他标注都使用非标准的NonStandardAnnotation类型来定义。它对应的专属类型为“non_standard”，标注的内容以字典的形式存储于value字段。

class NonStandardContentBlock(TypedDict):
    type: Literal["non_standard"]
    id: NotRequired[str]
    value: dict[str, Any]
    index: NotRequired[int | str]

2. InvalidToolCall

InvalidToolCall是专门为处理模型幻觉或解析失败而设计的结构化错误类型。当模型通过分析提示词并确定需要调用某个工具时，它会尝试生成对应的ToolCall。如果生成的参数不具有有效结构，此时不会有异常抛出来，而是会生成一个InvalidToolCall来描述这种“生成TooCall失败”的场景。

class InvalidToolCall(TypedDict):
    type: Literal["invalid_tool_call"]
    id: str | None
    name: str | None
    args: str | None
    error: str | None
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

InvalidToolCall专属的类型为“invalid_tool_call”，其id、name、args和error分别表示试图生成“工具调用”的唯一标识、名称、输入参数和错误描述。

3. ReasoningContentBlock

ReasoningContentBlock是专门为“推理型模型”设计的结构化内容块。它的应用标志着大模型从直接给出答案进化到了“先思考，后回答”的显式表达阶段。它专属的类型为“reasoning”，具体的推理逻辑通过reasoning字段返回的文本进行描述。

class ReasoningContentBlock(TypedDict):
    type: Literal["reasoning"]
    id: NotRequired[str]
    reasoning: NotRequired[str]
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

4. NonStandardContentBlock

NonStandardContentBlock是一个典型的“中间层兼容方案”。它的存在是为了解决大模型行业飞速发展带来的非标准输出与LangChain 核心标准之间的冲突。大模型厂商竞争激烈，经常推出新的内容形式（比如自定义的3D渲染数据、特定的数学公式格式或私有的文件引用结构等），当 LangChain 的核心库还没来得及为它们定义专属的ContentBlock类型时，统一使用NonStandardContentBlock来表示。它对应的专属类型为“non_standard”，承载的内容存储于value字段返回的字典中。

class NonStandardContentBlock(TypedDict):
    type: Literal["non_standard"]
    id: NotRequired[str]
    value: dict[str, Any]
    index: NotRequired[int | str]

5. DataContentBlock

表示“数据内容”的DataContentBlock也不是一个具体的类型，而是针对五个具体类型的联合，它们分别对应于图片、视频、音频、纯文本和文件五种内容形式。它们与HTTP请求和响应的主体内容极其相似，而且它们的mime_type字段表示的MIME类型与HTTP中的语义是完全一致的。

DataContentBlock = (
    ImageContentBlock
    | VideoContentBlock
    | AudioContentBlock
    | PlainTextContentBlock
    | FileContentBlock
)

这五个具体的数据内容块专属的类型分别是“image”、“video”、“audio”、“text-plain”和“file”。共同的字段除了mime_type之外，还有表示文件标识的file_id字段，表示目标地址的url字段和采用Base64编码内容base64字段。PlainTextContentBlock处理表示文本内容的text之外，还有表示标题和上下文的title和context字段。

class ImageContentBlock(TypedDict):
    type: Literal["image"]
    id: NotRequired[str]
    file_id: NotRequired[str]
    mime_type: NotRequired[str]
    index: NotRequired[int | str]
    url: NotRequired[str]
    base64: NotRequired[str]
    extras: NotRequired[dict[str, Any]]

class VideoContentBlock(TypedDict):
    type: Literal["video"]
    id: NotRequired[str]
    file_id: NotRequired[str]
    mime_type: NotRequired[str]
    index: NotRequired[int | str]
    url: NotRequired[str]
    base64: NotRequired[str]
    extras: NotRequired[dict[str, Any]]

class AudioContentBlock(TypedDict):
    type: Literal["audio"]
    id: NotRequired[str]
    file_id: NotRequired[str]
    mime_type: NotRequired[str]
    index: NotRequired[int | str]
    url: NotRequired[str]
    base64: NotRequired[str]
    extras: NotRequired[dict[str, Any]]

class PlainTextContentBlock(TypedDict):
    type: Literal["text-plain"]
    id: NotRequired[str]
    file_id: NotRequired[str]
    mime_type: Literal["text/plain"]
    index: NotRequired[int | str]
    url: NotRequired[str]
    base64: NotRequired[str]
    text: NotRequired[str]
    title: NotRequired[str]
    context: NotRequired[str]
    extras: NotRequired[dict[str, Any]]

class FileContentBlock(TypedDict):
    type: Literal["file"]
    id: NotRequired[str]
    file_id: NotRequired[str]
    mime_type: NotRequired[str]
    index: NotRequired[int | str]
    url: NotRequired[str]
    base64: NotRequired[str]
    extras: NotRequired[dict[str, Any]]

6. ToolContentBlock

ToolContent同样不是一个具体的类型，而是与工具调用相关的五个类型的联合，其中包括前面介绍的ToolCall和ToolCallChunk。它们是语言模型的产物，是模型“工具调用”的结构化描述，分别通过AIMessage和AIMessageChunk返回给Agent，然后由后者实施调用。

ToolContentBlock = (
    ToolCall | ToolCallChunk | ServerToolCall | ServerToolCallChunk | ServerToolResult
)

如果我们使用HTTP作为类比，这样的作法相当于客户端重定向，那么有没有服务端重定向呢？当然有，当承载模型的服务端接收到Agent发送的提示词后，它其实可以在需要的时候自行实施工具调用。ServerToolCall和ServerToolCallChunk用于木描述这种由“服务端实施”的工具调用。这两个类型的成员定义与ToolCall和ToolCallChunk很类似，专属类型分别为“server_tool_call”和“server_tool_call_chunk”

class ServerToolCall(TypedDict):
    type: Literal["server_tool_call"]
    id: str
    name: str
    args: dict[str, Any]
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

class ServerToolCallChunk(TypedDict):
    type: Literal["server_tool_call_chunk"]
    name: NotRequired[str]
    args: NotRequired[str]
    id: NotRequired[str]
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

ServerToolCall/ServerToolCallChunk通常与MCP或远程工具服务相关，用于描述发送给远程工具服务器的请求。承载模型的服务端可以向独立运行的“工具服务器”发送一个 RPC 指令来远程执行指定的工具。比如在使用LangGraph的ToolNode时，如果是连接到托管的 MCP 服务器（如数据库查询服务），系统会将模型的生成的“服务调用意图”转化为发往该服务器的指令。

服务端驱动的工具调用的结果可以用一个ServerToolResult对象表示，它对应的专属类型为“server_tool_result”，我们可以利用它的tool_call_id、status和output字段得到工具调用的标识、状态和输出。

class ServerToolResult(TypedDict):
    type: Literal["server_tool_result"]
    id: NotRequired[str]
    tool_call_id: str
    status: Literal["success", "error"]
    output: NotRequired[Any]
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]