LangChain4j从入门到精通-7-AI Services

到目前为止，我们一直在介绍诸如ChatModel、ChatMessage、ChatMemory等底层组件。在这个层面上工作非常灵活，能给予你完全的自由，但也迫使你编写大量样板代码。由于基于LLM的应用程序通常不仅需要单一组件，而是需要多个组件协同工作（例如提示模板、聊天记忆、LLM、输出解析器、RAG组件：嵌入模型和存储），并且常常涉及多次交互，协调所有这些组件变得更加繁琐。

分享牛

998人浏览 · 2026-01-27 10:00:00

分享牛 · 2026-01-27 10:00:00 发布

LangChain4j从入门到精通-7-AI Services

我们希望您专注于业务逻辑，而非底层实现细节。因此，LangChain4j 目前提供了两个高级概念来帮助实现这一点：AI Services和链(Chains)。

Chains (旧版)

链（Chain）的概念源自Python的LangChain（在引入LCEL之前）。其核心思想是为每个常见用例（如聊天机器人、RAG等）设计专属的Chain。链通过整合多个底层组件，协调它们之间的交互运作。

但主要问题在于：当需要定制功能时，它们的灵活性不足。目前LangChain4j仅实现了两种链（ConversationalChain和ConversationalRetrievalChain），现阶段我们暂无新增其他链的计划。

AI Services

我们提出了另一种名为AI Services的解决方案，专为Java量身定制。其核心理念是将与大型语言模型（LLM）及其他组件交互的复杂性隐藏在简单的API背后。

这种方法与Spring Data JPA或Retrofit非常相似：你可以声明式地定义一个包含所需API的接口，然后由LangChain4j提供一个实现该接口的对象（代理）。
你可以将AI服务视为应用程序服务层的一个组件。它提供的是AI服务，因此得名。

AI Services处理最常见的操作：

为LLM格式化输入
解析LLM的输出

它们还支持更高级的功能：

聊天记忆
工具
RAG

AI服务可用于构建支持多轮交互的有状态聊天机器人，也可用于自动化处理每次调用大语言模型（LLM）的独立流程。
让我们先来看看最简单的AI服务。之后，我们将探讨更复杂的例子。

最简单的AI服务

首先，我们定义一个接口，其中包含一个方法 chat，该方法接收一个 String作为输入并返回一个 String。

interface Assistant {

    String chat(String userMessage);
}

然后，我们创建底层组件。这些组件将在我们的AI服务内部使用。在这个例子中，我们只需要ChatModel。

ChatModel model = OpenAiChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName(GPT_4_O_MINI)
    .build();

最后，我们可以使用 AiServices类来创建 AI 服务的实例：

Assistant assistant = AiServices.create(Assistant.class, model);

:::注意
在Quarkus和Spring Boot应用中，自动配置会处理创建Assistant bean的过程。
这意味着你无需调用AiServices.create(…)，只需在需要的地方直接注入/自动装配Assistant即可。
:::
现在我们可以使用 Assistant：

String answer = assistant.chat("Hello");
System.out.println(answer); // Hello, how can I help you?

它是如何运作的？

您将接口的 Class提供给 AiServices，同时提供底层组件，AiServices会创建一个实现该接口的代理对象。
目前它使用反射机制，但我们也在考虑其他替代方案。
该代理对象会处理所有输入和输出的转换。在这个例子中，输入是一个简单的 String，但我们使用的是以 ChatMessage为输入的 ChatModel。因此，AiService会自动将其转换为 UserMessage并调用 ChatModel。由于 chat方法的返回类型是 String，当 ChatModel返回 AiMessage后，它会被转换为 String再从 chat方法返回。

SystemMessage

现在，我们来看一个更复杂的例子。我们将强制让LLM用俚语来回答 😉 这通常通过在SystemMessage中提供指令来实现

interface Friend {

    @SystemMessage("You are a good friend of mine. Answer using slang.")
    String chat(String userMessage);
}

Friend friend = AiServices.create(Friend.class, model);

String answer = friend.chat("Hello"); // Hey! What's up?

在这个例子中，我们添加了带有系统提示模板的@SystemMessage注解。这将在后台转换为SystemMessage，并与UserMessage一起发送给大语言模型(LLM)。

@SystemMessage 也可以从资源中加载提示模板:
@SystemMessage(fromResource = "my-prompt-template.txt")

系统消息提供者

系统消息也可以通过系统消息提供程序动态定义：

Friend friend = AiServices.builder(Friend.class)
    .chatModel(model)
    .systemMessageProvider(chatMemoryId -> "You are a good friend of mine. Answer using slang.")
    .build();

如您所见，您可以根据聊天记忆ID（用户或对话）提供不同的系统消息。

UserMessage

现在，假设我们使用的模型不支持系统消息，或者我们只是想用 UserMessage来实现这个目的。

interface Friend {

    @UserMessage("You are a good friend of mine. Answer using slang. {{it}}")
    String chat(String userMessage);
}

Friend friend = AiServices.create(Friend.class, model);

String answer = friend.chat("Hello"); // Hey! What's shakin'?

我们将 @SystemMessage注解替换为 @UserMessage，并指定了一个包含变量 {{it}}的提示模板，该变量指向方法的唯一参数。注意，当前版本中it是内置的参数

也可以对 String userMessage使用 @V进行注解，并为提示模板变量分配一个自定义名称：

interface Friend {

    @UserMessage("You are a good friend of mine. Answer using slang. {{message}}")
    String chat(@V("message") String userMessage);
}

:::注意
请注意，在使用LangChain4j与Quarkus或Spring Boot时，无需使用@V注解。该注解仅在Java编译期间未启用-parameters选项时才需要。

:::

@UserMessage 也可以从资源中加载提示模板:
@UserMessage(fromResource = "my-prompt-template.txt")

程序化聊天请求重写

在某些情况下，在将ChatRequest发送给大语言模型之前对其进行修改可能会很有用。例如，可能需要根据某些外部条件向用户消息追加一些额外的上下文或修改系统消息。可以通过配置AI服务来实现，使用一个UnaryOperator<ChatRequest>来实现对ChatRequest的转换：

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(model)
    .chatRequestTransformer(transformingFunction)  // Configures the transformation function to be applied to the ChatRequest
    .build();

如果需要访问 ChatMemory来实现所需的 ChatRequest转换，也可以将 chatRequestTransformer方法配置为 BiFunction<ChatRequest, Object, ChatRequest>，其中传递给该函数的第二个参数是内存 ID。

聊天请求参数

另一个自由度是可以基于每次调用配置参数（例如温度、工具选择、最大令牌数等）。例如，你可能希望某些请求更具“创造性”（温度较高），而其他请求则更具确定性（温度较低）。
为此，您可以创建一个AI服务方法，该方法还接受一个类型为ChatRequestParameters（或任何特定于供应商的类型，如OpenAiChatRequestParameters）的参数。这告诉LangChain4j在每次调用时接受并合并这些参数。

请注意，在 ChatRequestParameters中指定的 toolSpecifications和 responseFormat将覆盖 AI 服务生成的相应内容。
用一个第二个参数来定义你的接口：

interface AssistantWithChatParams {

    String chat(@UserMessage String userMessage, ChatRequestParameters params);
}

构建AI服务:
java

AssistantWithChatParams assistant = AiServices.builder(AssistantWithChatParams.class)
    .chatModel(openAiChatModel)  // or whichever model
    .build();

使用任何每次调用的参数调用它：

ChatRequestParameters customParams = ChatRequestParameters.builder()
    .temperature(0.85)
    .build();

String answer = assistant.chat("Hi there!", customParams);

作为参数传递给AI服务方法的ChatRequestParameters也会传播到前文讨论的chatRequestTransformer中，因此如有需要，也可以在那里访问和修改它。

有效的AI服务方法示例

以下是一些有效的AI服务方法示例。

`UserMessage`

String chat(String userMessage);

String chat(@UserMessage String userMessage);

String chat(@UserMessage String userMessage, ChatRequestParameters parameters);

String chat(@UserMessage String userMessage, @V("country") String country); // userMessage contains "{{country}}" template variable

String chat(@UserMessage String userMessage, @UserMessage Content content); // content can be one of: TextContent, ImageContent, AudioContent, VideoContent, PdfFileContent

String chat(@UserMessage String userMessage, @UserMessage ImageContent image); // second argument can be one of: TextContent, ImageContent, AudioContent, VideoContent, PdfFileContent

String chat(@UserMessage String userMessage, @UserMessage List<Content> contents);

String chat(@UserMessage String userMessage, @UserMessage List<ImageContent> images);

@UserMessage("What is the capital of Germany?")
String chat();

@UserMessage("What is the capital of {{it}}?")
String chat(String country);

@UserMessage("What is the capital of {{country}}?")
String chat(@V("country") String country);

@UserMessage("What is the {{something}} of {{country}}?")
String chat(@V("something") String something, @V("country") String country);

@UserMessage("What is the capital of {{country}}?")
String chat(String country); // this works only in Quarkus and Spring Boot applications

`SystemMessage` 和 `UserMessage`

@SystemMessage("Given a name of a country, answer with a name of it's capital")
String chat(String userMessage);

@SystemMessage("Given a name of a country, answer with a name of it's capital")
String chat(@UserMessage String userMessage);

@SystemMessage("Given a name of a country, {{answerInstructions}}")
String chat(@V("answerInstructions") String answerInstructions, @UserMessage String userMessage);

@SystemMessage("Given a name of a country, answer with a name of it's capital")
String chat(@UserMessage String userMessage, @V("country") String country); // userMessage contains "{{country}}" template variable

@SystemMessage("Given a name of a country, {{answerInstructions}}")
String chat(@V("answerInstructions") String answerInstructions, @UserMessage String userMessage, @V("country") String country); // userMessage contains "{{country}}" template variable

@SystemMessage("Given a name of a country, answer with a name of it's capital")
@UserMessage("Germany")
String chat();

@SystemMessage("Given a name of a country, {{answerInstructions}}")
@UserMessage("Germany")
String chat(@V("answerInstructions") String answerInstructions);

@SystemMessage("Given a name of a country, answer with a name of it's capital")
@UserMessage("{{it}}")
String chat(String country);

@SystemMessage("Given a name of a country, answer with a name of it's capital")
@UserMessage("{{country}}")
String chat(@V("country") String country);

@SystemMessage("Given a name of a country, {{answerInstructions}}")
@UserMessage("{{country}}")
String chat(@V("answerInstructions") String answerInstructions, @V("country") String country);

多模态

除了文本内容之外，AI服务方法还可以接受一个或多个Content或List<Content>参数：

String chat(@UserMessage String userMessage, @UserMessage Content content);

String chat(@UserMessage String userMessage, @UserMessage ImageContent image);

String chat(@UserMessage String userMessage, @UserMessage ImageContent image, @UserMessage AudioContent audio);

String chat(@UserMessage String userMessage, @UserMessage List<Content> contents);

String chat(@UserMessage String userMessage, @UserMessage List<ImageContent> images);

String chat(Content content);

String chat(AudioContent content);

String chat(List<Content> contents);

String chat(List<AudioContent> contents);

String chat(@UserMessage Content content1, @UserMessage Content content2);

String chat(@UserMessage AudioContent audio, @UserMessage ImageContent image);

AI服务将按照参数声明的顺序将所有内容放入最终的UserMessage中。

返回类型

AI服务方法可以返回以下类型之一：

String - 在这种情况下，LLM生成的输出未经任何处理/解析直接返回
结构化输出支持的任何类型 - 在这种情况下，AI服务会在返回之前将LLM生成的输出解析为所需类型。任何类型都可以额外包装成Result<T>，以获取有关AI服务调用的额外元数据：
TokenUsage- AI服务调用期间使用的总令牌数。如果AI服务多次调用LLM（例如，由于执行了工具），它将汇总所有调用的令牌使用量。
来源 - 在 RAG检索过程中获取的内容
在AI服务调用期间执行的所有工具（包括请求和结果)
最终聊天回复的FinishReason
所有中间 ChatResponse
最终的 ChatResponse
例如：

interface Assistant {
    
    @UserMessage("Generate an outline for the article on the following topic: {{it}}")
    Result<List<String>> generateOutlineFor(String topic);
}

Result<List<String>> result = assistant.generateOutlineFor("Java");

List<String> outline = result.content();
TokenUsage tokenUsage = result.tokenUsage();
List<Content> sources = result.sources();
List<ToolExecution> toolExecutions = result.toolExecutions();
FinishReason finishReason = result.finishReason();

结构化输出

如果你想从大语言模型（LLM）获取结构化输出（例如一个复杂的Java对象，而非String中的非结构化文本），可以将AI服务方法的返回类型从String更改为其他类型。

举几个例子：

`boolean` as return type

interface SentimentAnalyzer {

    @UserMessage("Does {{it}} has a positive sentiment?")
    boolean isPositive(String text);

}

SentimentAnalyzer sentimentAnalyzer = AiServices.create(SentimentAnalyzer.class, model);

boolean positive = sentimentAnalyzer.isPositive("It's wonderful!");
// true

枚举

enum Priority {
    CRITICAL, HIGH, LOW
}

interface PriorityAnalyzer {
    
    @UserMessage("Analyze the priority of the following issue: {{it}}")
    Priority analyzePriority(String issueDescription);
}

PriorityAnalyzer priorityAnalyzer = AiServices.create(PriorityAnalyzer.class, model);

Priority priority = priorityAnalyzer.analyzePriority("The main payment gateway is down, and customers cannot process transactions.");
// CRITICAL

POJO

class Person {

    @Description("first name of a person") // you can add an optional description to help an LLM have a better understanding
    String firstName;
    String lastName;
    LocalDate birthDate;
    Address address;
}

@Description("an address") // you can add an optional description to help an LLM have a better understanding
class Address {
    String street;
    Integer streetNumber;
    String city;
}

interface PersonExtractor {

    @UserMessage("Extract information about a person from {{it}}")
    Person extractPersonFrom(String text);
}

PersonExtractor personExtractor = AiServices.create(PersonExtractor.class, model);

String text = """
            In 1968, amidst the fading echoes of Independence Day,
            a child named John arrived under the calm evening sky.
            This newborn, bearing the surname Doe, marked the start of a new journey.
            He was welcomed into the world at 345 Whispering Pines Avenue
            a quaint street nestled in the heart of Springfield
            an abode that echoed with the gentle hum of suburban dreams and aspirations.
            """;

Person person = personExtractor.extractPersonFrom(text);

System.out.println(person); // Person { firstName = "John", lastName = "Doe", birthDate = 1968-07-04, address = Address { ... } }

JSON

在提取自定义POJO（实际上是JSON，随后会被解析为POJO）时，建议在模型配置中启用"JSON模式"。这样，LLM将被强制要求返回有效的JSON响应。

:::注意
请注意，JSON模式与工具/函数调用功能相似，但API接口不同，用途各异。
JSON模式在您始终需要LLM以结构化格式（有效的JSON）响应时非常有用。此外，通常不需要状态/记忆，因此每次与LLM的交互都是独立的。

例如，您可能希望从文本中提取信息，比如文本中提到的人员列表，或者将自由形式的产品评论转换为具有以下字段的结构化表单： String productName（产品名称）、Sentiment sentiment（情感倾向）、List<String> claimedProblems（声称的问题列表）等。
另一方面，当需要让大语言模型（LLM）执行某些操作时（例如查询数据库、搜索网页、取消用户预订等），工具/功能就显得非常有用。在这种情况下，我们会向LLM提供一系列工具及其预期的JSON模式，LLM会自主决定是否调用这些工具来满足用户请求。
早些时候，函数调用常用于结构化数据提取，但现在我们有了更适合此目的的JSON模式功能。

:::

启用JSON模式的方法如下：

OpenAI:

对于支持的新型号结构化输出 (gpt-4o-mini, gpt-4o-2024-08-06模型):

OpenAiChatModel.builder()
    ...
    .supportedCapabilities(RESPONSE_FORMAT_JSON_SCHEMA)
    .strictJsonSchema(true)
    .build();

旧模型 (gpt-3.5-turbo, gpt-4):

OpenAiChatModel.builder()
    ...
    .responseFormat("json_object")
    .build();

对于Azure OpenAI：

AzureOpenAiChatModel.builder()
    ...
    .responseFormat(new ChatCompletionsJsonResponseFormat())
    .build();

针对Vertex AI Gemini：

VertexAiGeminiChatModel.builder()
    ...
    .responseMimeType("application/json")
    .build();

或者通过从Java类中指定显式模式：

VertexAiGeminiChatModel.builder()
    ...
    .responseSchema(SchemaHelper.fromClass(Person.class))
    .build();

来自JSON模式：

VertexAiGeminiChatModel.builder()
    ...
    .responseSchema(Schema.builder()...build())
    .build();

Google AI Gemini:

GoogleAiGeminiChatModel.builder()
    ...
    .responseFormat(ResponseFormat.JSON)
    .build();

或者通过从Java类中指定显式模式：

GoogleAiGeminiChatModel.builder()
    ...
    .responseFormat(ResponseFormat.builder()
        .type(JSON)
        .jsonSchema(JsonSchemas.jsonSchemaFrom(Person.class).get())
        .build())
    .build();

来自JSON模式：

GoogleAiGeminiChatModel.builder()
    ...
    .responseFormat(ResponseFormat.builder()
        .type(JSON)
        .jsonSchema(JsonSchema.builder()...build())
        .build())
    .build();

Mistral AI:

MistralAiChatModel.builder()
    ...
    .responseFormat(MistralAiResponseFormatType.JSON_OBJECT)
    .build();

Ollama:

OllamaChatModel.builder()
    ...
    .responseFormat(JSON)
    .build();

对于其他模型提供商：如果底层模型不支持JSON模式，提示工程是您的最佳选择。此外，尝试降低temperature参数以获得更高的确定性。

更多样例

Streaming

AI服务在使用TokenStream返回类型时，可以逐令牌流式传输响应。


interface Assistant {

    TokenStream chat(String message);
}

StreamingChatModel model = OpenAiStreamingChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName(GPT_4_O_MINI)
    .build();

Assistant assistant = AiServices.create(Assistant.class, model);

TokenStream tokenStream = assistant.chat("Tell me a joke");

CompletableFuture<ChatResponse> futureResponse = new CompletableFuture<>();

tokenStream
    .onPartialResponse((String partialResponse) -> System.out.println(partialResponse))
    .onPartialThinking((PartialThinking partialThinking) -> System.out.println(partialThinking))
    .onRetrieved((List<Content> contents) -> System.out.println(contents))
    .onIntermediateResponse((ChatResponse intermediateResponse) -> System.out.println(intermediateResponse))
     // This will be invoked every time a new partial tool call (usually containing a single token of the tool's arguments) is available.
    .onPartialToolCall((PartialToolCall partialToolCall) -> System.out.println(partialToolCall))
     // This will be invoked right before a tool is executed. BeforeToolExecution contains ToolExecutionRequest (e.g. tool name, tool arguments, etc.)
    .beforeToolExecution((BeforeToolExecution beforeToolExecution) -> System.out.println(beforeToolExecution))
     // This will be invoked right after a tool is executed. ToolExecution contains ToolExecutionRequest and tool execution result.
    .onToolExecuted((ToolExecution toolExecution) -> System.out.println(toolExecution))
    .onCompleteResponse((ChatResponse response) -> futureResponse.complete(response))
    .onError((Throwable error) -> futureResponse.completeExceptionally(error))
    .start();

futureResponse.join(); // Blocks the main thread until the streaming process (running in another thread) is complete

Streaming 取消

如果您想取消流式传输，可以通过以下回调之一进行操作：

onPartialResponseWithContext(BiConsumer<PartialResponse, PartialResponseContext>)
onPartialThinkingWithContext(BiConsumer<PartialThinking, PartialThinkingContext>)

例如：

tokenStream
    .onPartialResponseWithContext((PartialResponse partialResponse, PartialResponseContext context) -> {
        process(partialResponse);
        if (shouldCancel()) {
            context.streamingHandle().cancel();
        }
    })
    .onCompleteResponse((ChatResponse response) -> futureResponse.complete(response))
    .onError((Throwable error) -> futureResponse.completeExceptionally(error))
    .start();

当调用 StreamingHandle.cancel()时，LangChain4j 将关闭连接并停止流式传输。一旦调用了 StreamingHandle.cancel()，TokenStream将不再接收任何回调。

Flux

你也可以使用 Flux<String>来代替 TokenStream。为此，请导入 langchain4j-reactor模块：

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-reactor</artifactId>
    <version>1.10.0-beta18</version>
</dependency>

interface Assistant {

  Flux<String> chat(String message);
}

Streaming example

聊天记忆

AI服务可以利用聊天记忆来"记住"之前的交互：

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(model)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .build();

在这种情况下，同一个 ChatMemory实例将用于 AI 服务的所有调用。然而，如果你有多个用户，这种方法就行不通了，因为每个用户都需要自己的 ChatMemory实例来维护各自的对话。解决这个问题的方法是使用 ChatMemoryProvider：


interface Assistant  {
    String chat(@MemoryId int memoryId, @UserMessage String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(model)
    .chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(10))
    .build();

String answerToKlaus = assistant.chat(1, "Hello, my name is Klaus");
String answerToFrancine = assistant.chat(2, "Hello, my name is Francine");

在此场景中，ChatMemoryProvider将提供两个独立的 ChatMemory实例，每个内存 ID 对应一个。
以这种方式使用ChatMemory时，还需要清除不再需要的对话记忆以避免内存泄漏。为了让AI服务内部使用的聊天记忆可被访问，只需让定义它的接口继承ChatMemoryAccess即可


interface Assistant extends ChatMemoryAccess {
    String chat(@MemoryId int memoryId, @UserMessage String message);
}

This makes it possible to both access the ChatMemory instance of a single conversation and to get rid of it when the conversation is terminated.

String answerToKlaus = assistant.chat(1, "Hello, my name is Klaus");
String answerToFrancine = assistant.chat(2, "Hello, my name is Francine");

List<ChatMessage> messagesWithKlaus = assistant.getChatMemory(1).messages();
boolean chatMemoryWithFrancineEvicted = assistant.evictChatMemory(2);

:::注意
请注意，如果AI服务方法中没有使用@MemoryId注解的参数，ChatMemoryProvider中的memoryId值将默认为字符串"default"。
请注意，对于相同的 @MemoryId，不应同时调用 AI 服务，否则可能导致 ChatMemory数据损坏。目前，AI 服务未实现任何机制来防止对相同 @MemoryId的并发调用。

:::

工具 (Function Calling)

AI服务可以配置LLM可使用的工具：


class Tools {
    
    @Tool
    int add(int a, int b) {
        return a + b;
    }

    @Tool
    int multiply(int a, int b) {
        return a * b;
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(model)
    .tools(new Tools())
    .build();

String answer = assistant.chat("What is 1+2 and 3*4?");

在这种情况下，大语言模型会先请求执行add(1, 2)和multiply(3, 4)方法，然后给出最终答案。LangChain4j将自动执行这些方法。

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

Redisson分布式锁的实现原理与加锁机制（含可视化）

Redisson 基于 Redis 提供了健壮的分布式锁实现，核心由原子化 Lua 脚本、可重入计数、看门狗自动续期、自旋重试、以及在集群环境下的路由与脚本缓存优化共同构成。本文在校正与扩展原有资料的基础上，系统性梳理其原理与工程实践，并附带配色优化的 Mermaid 图以提升可读性。Redisson 分布式锁以工程化手段解决了“原子性、持有者识别、长耗时续期、重入与释放安全”等关键问题。理解其在