Spring AI + Google 简单构建RAG

到这个小demo，完全意识到了Spring AI的不稳定，尽管我在用的是Milestone版本，API和实体类也是换来换去，此时也能感受到模型解决问题的瓶颈，虽然在这个过程中G老师还是能帮助我很多，但是这种比较新的东西确实需要程序员的基本功，锁定解决问题的方向，加以模型的辅助才能比较好的完成问题。我遇到了很多坑，但不是每一个坑都解释在文档里了，我觉得练手可以，起一个这样的项目不是很稳定。环境：运行

m0_69818080

624人浏览 · 2025-11-25 18:35:06

m0_69818080 · 2025-11-25 18:35:06 发布

前言

环境：
运行系统：MacOS
Docker 提前安装好
Spring AI 版本： 1.1.0-M3，因为只有在这个版本下，我的依赖包在国内的Aliyun仓库都有。
Vector DB: chroma，比较简单，适合搭练手项目
Embedding Model: text-embedding-004 by Google，这里也是给自己上难度了，如果有别的API Key 建议不用G家的
Docker 镜像仓库：docker.1ms.run

步骤

RAG简单来说就是让大模型只在限定的向量数据库里找答案。所以建这样一个项目flow主要是：

存文本到向量数据库(Injestion)：读取资源 - > 资源切块 -> 调Embedding模型 -> 存到向量数据库（一次性或定时任务）
从向量数据库里搜索(search): 用户输入 -> 调用模型Embed输入-> 向量数据库similarity search -> 作为上下文发给LLM -> LLM返回结果给用户

准备Prompt

AI写一个就行，注意这个不是系统提示词，它是一个prompt template，需要根据用户输入和向量搜索结果填充的。和ReAct Agent那种定义执行逻辑的系统提示词是不一样的。

你是一个智能助手。请根据以下提供的【上下文信息】来回答用户的【问题】。
如果你不知道答案，请直接说不知道，不要编造。

【上下文信息】：
{documents}

【问题】：
{input}

Chunking

把文本资源分块，块分的太小，上下文的语义会丢失，块分的太大，embedding之后信息压缩也会比较厉害，查找的相似性就会降低。这里我问过AI 分块一般是模型来分，还是手写代码逻辑来分。
一般来说，建议手写代码逻辑来分，这块很容易想到策略模式，不同的分块逻辑可以有不同的实现。代码可控且比较稳定，同时，省钱啊，token那么贵。

这里不贴代码了，根据你的文本资源组织形式，可以按照换行来分，按照标题字体大小来分，等等。

从这个角度看，需要存起来作为context_knowledge_base的文档，也是需要有比较清晰的组织形式的。

Embedding

Google的Embedding模型是非对称的，这里体现在它存储和查询的embedding是不一样的，从文档可以看出来，有一个TASK_TYPE字段区分对查询问题的embedding，还是对段落进行embedding存储。
这一块模型的选择就给实现上了复杂度，因为其他家都是对称的，查询和存储用的同一个模型，这样这个模型就可以作为为一的Bean，在需要的时候被Spring 注入，完成很多自动化的任务。
显然G家不可以，那么就要根据不同的任务分别注入不同的bean。

@Configuration
public class GeminiEmbeddingConfig {


    @Value("${spring.ai.google.genai.embedding.text.options.model}")
    private String modelName;

    @Bean
    @Primary
    EmbeddingModel geminiQueryEmbeddingModel(GoogleGenAiEmbeddingConnectionDetails connectionDetails) {
        GoogleGenAiTextEmbeddingOptions options = GoogleGenAiTextEmbeddingOptions.builder()
                .model(modelName)
                .taskType(GoogleGenAiTextEmbeddingOptions.TaskType.RETRIEVAL_QUERY)
                .build();

        return new GoogleGenAiTextEmbeddingModel(connectionDetails, options);
    }

    @Bean
    EmbeddingModel geminiDocumentEmbeddingModel(GoogleGenAiEmbeddingConnectionDetails connectionDetails) {

        GoogleGenAiTextEmbeddingOptions options = GoogleGenAiTextEmbeddingOptions.builder()
                .model(modelName)
                .taskType(GoogleGenAiTextEmbeddingOptions.TaskType.RETRIEVAL_DOCUMENT)
                .build();
        return new GoogleGenAiTextEmbeddingModel(connectionDetails, options);
    }


}

VectorStore

Chroma Local Setup

直接使用 Docker 运行 Chroma Server, 本地启动一个Chroma服务端。
国内直接从Docker Hub拉取镜像应该会失败，报connection reset什么的，其实就是和Maven Central拉不下来一样，需要一点魔法。我一般都是临时找一个能用的镜像源，比如这次用的以下：

docker pull docker.1ms.run/chromadb/chroma

项目启动之前记得把这个server起起来，本地启动容器就行，我直接在Docker Desktop上点的Run
Connect to Chroma at: http://localhost:8000⁠

Spring 注入Vectore Store

如果是对称模型，只有一个EmbeddingModel bean，Spring可以实现自动注入VectorStore bean里面，结果就是对段落的向量存储甚至不需要显式地调用embeddingModel.embed API, 而是vectorStore.add这个API底层把embedding API的调用过程封装好了。

// 保存到 Chroma (Spring AI 会自动调用 Embedding Model 转换向量)
vectorStore.add(List.of(document));

G家是非对称模型，所以采取的方案是，为VectorStore指定一个model bean注入，要么查询，要么存储。对另一个任务，直接调用底层的ChromaAPI操作，不用上层的VectorStore封装。

因为context一旦存好了，读的操作会比写入更多，所以VectorStore我指定的是查询模型。

  @Bean
    public VectorStore vectorStore(ChromaApi chromaApi,
                                   @Qualifier("geminiQueryEmbeddingModel") EmbeddingModel queryModel) {
        // 这里注入的是“查询模型”
        return ChromaVectorStore.builder(chromaApi, queryModel)
                .collectionName(COLLECTION_NAME)
                .initializeSchema(INITIALIZE_SCHEMA).build();
    }

数据存入Vectore Store

在存储的时候，手动调用geminiDocumentEmbeddingModel算出向量，利用更底层的ChromaAPI存进去。
这里的实现是跟着VectorStore.add -> AbstractObservationVectorStore.doAdd -> ChromaVectorStore.doAdd的调用链找到源码参考的。

@Component
public class VectorIngestService {

    @Value("${spring.ai.vectorstore.chroma.collection-name}")
    private String COLLECTION_NAME;

    private static final String TENANT_NAME = "default_tenant";

    private static final String DATABASE_NAME = "default_database";

    private final EmbeddingModel geminiDocumentEmbeddingModel;

    private final ChromaApi chromaApi;

    public VectorIngestService(EmbeddingModel geminiDocumentEmbeddingModel, ChromaApi chromaApi) {
        this.geminiDocumentEmbeddingModel = geminiDocumentEmbeddingModel;
        this.chromaApi = chromaApi;
    }

    public void ingest(List<Document> documents) {

        ChromaApi.Collection collection = chromaApi.getCollection(TENANT_NAME, DATABASE_NAME, COLLECTION_NAME);

        if (collection == null) {
            throw new RuntimeException("Collection not found: " + COLLECTION_NAME);
        }

        String collectionId = collection.id();

        List<String> ids = new ArrayList<>();
        List<float[]> embeddings = new ArrayList<>();
        List<Map<String, Object>> metadatas = new ArrayList<>();
        List<String> contents = new ArrayList<>();

        for (Document document : documents) {
            ids.add(document.getId());
            metadatas.add(document.getMetadata());
            contents.add(document.getText());
            embeddings.add(geminiDocumentEmbeddingModel.embed(document));
        }

        ChromaApi.AddEmbeddingsRequest request = new ChromaApi.AddEmbeddingsRequest(ids, embeddings, metadatas, contents);

        chromaApi.upsertEmbeddings(TENANT_NAME, DATABASE_NAME, collectionId, request);
    }

}

Vectore Store 数据查询

@Component
public class VectorSearchService {

    private static final int TOP_K = 3;
    private final VectorStore vectorStore;

    public VectorSearchService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public List<Document> search(String query) {
        SearchRequest request = SearchRequest.builder().query(query).topK(TOP_K).build();
        return vectorStore.similaritySearch(request);
    }
}

ChatClient

Rag 的client比较简单，没有系统提示词，也没有工具注册，直接把模型注入就行，我定义个了一个工厂方法

 @Bean
    public ChatClient ragChatClient(ChatModel chatModel){
        return ChatClient.builder(chatModel).build();
    }

RagController

Controller里把查询服务注入就可以了，与用户的交互不涉及存储的过程

@RestController
@RequestMapping("/api/rag")
@CrossOrigin(origins = "*")
public class RagController {

    private final ChatClient ragChatClient;
    private final VectorSearchService vectorSearchService;


    @Value("classpath:/prompt/rag_prompt_template.txt")
    private Resource ragSystemPromptResource;

    public RagController(ChatClient ragChatClient, VectorSearchService vectorSearchService) {
        this.ragChatClient = ragChatClient;
        this.vectorSearchService = vectorSearchService;
    }

    public record RagQueryRequest(String question) {
    }

    @PostMapping("/ask")
    public String runRag(@RequestBody RagQueryRequest request) {
        List<Document> docs = vectorSearchService.search(request.question());
        String context = docs.stream()
                .map(doc -> "--- 来源: " + doc.getMetadata().get("source") + " ---\n" + doc.getFormattedContent())
                .collect(Collectors.joining("\n\n"));

        PromptTemplate promptTemplate = new PromptTemplate(ragSystemPromptResource);
        Message systemMessage = promptTemplate.createMessage(
                Map.of("documents", context, "input", request.question())
        );

        return ragChatClient.prompt()
                .messages(systemMessage)
                .call()
                .content();
    }
}

我跑了几个HTTP请求，练手流程是跑通了，现在更新的技术应该是GraphRAG，可以保留向量之间的关系，所以就不深究RAG项目了。

参考文档：

Embedding Models: https://docs.spring.io/spring-ai/reference/api/embeddings.html
Google Embedding Implementation: https://docs.spring.io/spring-ai/reference/api/embeddings/google-genai-embeddings-text.html#_using_gemini_developer_api_api_key
Vectore DB Chroma: https://docs.spring.io/spring-ai/reference/api/vectordbs/chroma.html