第一章RAG解锁笔记

RAG（检索增强生成）是一种结合检索与生成的技术，在LLM生成回答前先检索外部知识库信息，以提高回答的准确性和可靠性。其核心流程包括索引构建、检索相关文档和生成自然语言回答。RAG能解决LLM知识过时、幻觉等问题，具有准确性高、实时性强等优势。适用场景覆盖低中高风险任务，如翻译、法律咨询等。构建RAG系统需要数据准备、索引构建、检索优化和生成集成四个步骤，可使用LangChain、LlamaInd

干嘛那么认真冲动呢

543人浏览 · 2025-11-12 16:19:48

干嘛那么认真冲动呢 · 2025-11-12 16:19:48 发布

一、RAG 是什么？

全称：Retrieval-Augmented Generation（检索增强生成）
本质：在 LLM 生成回答之前，先从外部知识库中检索相关信息，再结合这些信息生成更准确、更可靠的回答。
核心逻辑：检索 → 增强 → 生成

💡 一句话总结：RAG = 检索 + 生成

二、RAG 的技术原理

双阶段架构：

检索阶段：从外部知识库中找出与问题相关的信息。
生成阶段：将检索到的信息作为上下文，输入 LLM 生成回答。

三个关键组件：

组件	作用	示例
索引	将文档切分并转换为向量	使用嵌入模型（如 text-embedding-ada-002）
检索	根据问题召回最相关的文档片段	使用向量数据库（如 FAISS、Milvus）
生成	结合检索结果生成自然语言回答	使用 LLM（如 GPT、Claude）

三、为什么要用 RAG？

解决 LLM 的四大局限：

LLM 的问题	RAG 的解决方案
知识过时	实时检索最新信息
产生幻觉	基于事实生成，减少编造
专业领域知识不足	引入领域知识库
数据隐私风险	本地部署知识库

RAG 的四大优势：

准确性高：有据可依，减少错误
实时性强：知识库可随时更新
成本低：避免频繁微调大模型
扩展性好：支持多源数据、模块化设计

四、RAG 的适用场景

风险等级	示例	RAG 适用性
低风险	翻译、语法检查	✅ 高
中风险	合同起草、法律咨询	⚠️ 需人工审核
高风险	证据分析、签证决策	🛡️ 需严格质量控制

五、如何上手 RAG？

基础工具推荐：

开发框架：LangChain、LlamaIndex
向量数据库：Milvus、FAISS、Pinecone

构建最小 RAG 系统（四步）：

数据准备：文档分块（段落或固定长度）
索引构建：使用嵌入模型转换为向量
检索优化：结合关键词 + 语义搜索
生成集成：设计提示词，调用 LLM 生成回答

自我检验问题

RAG 的全称是什么？
- Retrieval-Augmented Generation(检索增强生成)
RAG 的核心流程是哪三个阶段？
- 索引，检索，生成
为什么说 RAG 能减少 LLM 的“幻觉”？
- 基于检索内容生成，错误率低
举一个适合使用 RAG 的中风险场景。
- 法律咨询
你知道哪些 RAG 开发框架或向量数据库？
- RAG开发框架
  - langChain，LlamaIndex
- 向量数据库
  - Milvus,FAISS,Pinecone

五、RAG四步构建详解（LangChain实现）

🎯 第一步：数据准备 (Data Preparation)

# 加载本地markdown文件
loader = UnstructuredMarkdownLoader(markdown_path)
docs = loader.load()

# 文本分块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks = text_splitter.split_documents(docs)

关键理解：

分块策略：RecursiveCharacterTextSplitter 按段落→句子→词语递归分割
默认参数：chunk_size=4000, chunk_overlap=200
重叠作用：避免信息在分块边界丢失

🎯 第二步：索引构建 (Index Construction)

# 中文嵌入模型
embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-zh-v1.5",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)
  
# 构建向量存储
vectorstore = InMemoryVectorStore(embeddings)
vectorstore.add_documents(chunks)

关键理解：

嵌入模型：将文本转换为数值向量
向量相似度：用于衡量文本间的语义相关性
内存存储：InMemoryVectorStore 适合实验，生产环境需持久化存储

🎯 第三步：查询检索 (Query and Retrieval)

# 1. 用户查询
question = "文中举了哪些例子？"

# 2. 语义搜索 在向量存储中查询相关文档
retrieved_docs = vectorstore.similarity_search(question, k=3)

# 3. 准备上下文
docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

关键理解：

k值选择：返回最相关的k个文档片段
上下文拼接：用双换行符清晰分隔不同片段

🎯 第四步：生成集成 (Generation Integration)

# 1. 提示词模板
prompt = ChatPromptTemplate.from_template("""请根据下面提供的上下文信息来回答问题...""")

# 2. 配置LLM
llm = ChatDeepSeek(model="deepseek-chat", temperature=0.7)

# 3. 生成答案
answer = llm.invoke(prompt.format(question=question, context=docs_content))

关键理解：

提示工程：明确要求LLM基于上下文回答
温度参数：temperature=0.7 平衡创造性与一致性

框架对比：

特性	LangChain	LlamaIndex
灵活性	高，可精细控制	中，封装度更高
上手难度	较高	较低
代码量	较多	极少
适用场景	复杂业务流程	快速原型开发

实践

python 01_langchain_example.py

(all-in-rag) @WangDF2022 ➜ /workspaces/all-in-rag/code/C1 (main) $ python 01_langchain_example.py
content='根据提供的上下文，文中举了以下例子：\n\n1. **选择餐馆**：利用是指去最喜欢的已知餐馆；探索是指尝试新的餐馆。\n2. **做广告**：利用是指直接采取最优广告策略；探索是指尝试新的广告策略。\n3. **挖油**：利用是指在已知地点挖油；探索是指在新地点挖油。\n4. **玩游戏（《街头霸王》）**：利用是指总是采取同一策略（如蹲在角落一直出脚）；探索是指尝试新招式（如放出“大招”）。\n5. **象棋选手**：奖励为赢棋（正奖励）或输棋（负奖励）。\n6. **股票管理**：奖励由股票收益或损失决定。\n7. **玩雅达利游戏（如Pong）**：奖励为游戏分数的增减。\n8. **小车上山（MountainCar-v0）**：作为Gym库交互示例，涉及观测空间和动作空间的说明。\n\n这些例子用于说明强化学习中的探索与利用、奖励机制及序列决策等概念。' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 218, 'prompt_tokens': 1368, 'total_tokens': 1586, 'completion_tokens_details': None, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}, 'prompt_cache_hit_tokens': 0, 'prompt_cache_miss_tokens': 1368}, 'model_name': 'deepseek-chat', 'system_fingerprint': 'fp_ffc7281d48_prod0820_fp8_kvcache', 'id': 'dc6ffba9-7ee5-4499-b6ae-35ab8c507a22', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None} id='run--b4aef134-b302-47f6-a2c5-844864ba5ad4-0' usage_metadata={'input_tokens': 1368, 'output_tokens': 218, 'total_tokens': 1586, 'input_token_details': {'cache_read': 0}, 'output_token_details': {}}

python 02_llamaIndex_example.py

(all-in-rag) @WangDF2022 ➜ /workspaces/all-in-rag/code/C1 (main) $ python 02_llamaIndex_example.py 
/home/codespace/miniconda3/envs/all-in-rag/lib/python3.12/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'validate_default' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'validate_default' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
modules.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 3.54MB/s]
config_sentence_transformers.json: 100%|██████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 960kB/s]
README.md: 27.7kB [00:00, 87.1MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████████████████████████████████| 52.0/52.0 [00:00<00:00, 641kB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████| 776/776 [00:00<00:00, 7.55MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████| 95.8M/95.8M [00:01<00:00, 55.5MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 367/367 [00:00<00:00, 2.52MB/s]
vocab.txt: 110kB [00:00, 87.8MB/s]
tokenizer.json: 439kB [00:00, 130MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████| 125/125 [00:00<00:00, 958kB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:00<00:00, 1.38MB/s]
{'response_synthesizer:text_qa_template': SelectorPromptTemplate(metadata={'prompt_type': <PromptType.QUESTION_ANSWER: 'text_qa'>}, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings={}, function_mappings={}, default_template=PromptTemplate(metadata={'prompt_type': <PromptType.QUESTION_ANSWER: 'text_qa'>}, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template='Context information is below.\n---------------------\n{context_str}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query_str}\nAnswer: '), conditionals=[(<function is_chat_model at 0x73257fc71a80>, ChatPromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM: 'custom'>}, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, message_templates=[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text="You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directly reference the given context in your answer.\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.")]), ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Context information is below.\n---------------------\n{context_str}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query_str}\nAnswer: ')])]))]), 'response_synthesizer:refine_template': SelectorPromptTemplate(metadata={'prompt_type': <PromptType.REFINE: 'refine'>}, template_vars=['query_str', 'existing_answer', 'context_msg'], kwargs={}, output_parser=None, template_var_mappings={}, function_mappings={}, default_template=PromptTemplate(metadata={'prompt_type': <PromptType.REFINE: 'refine'>}, template_vars=['query_str', 'existing_answer', 'context_msg'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template="The original query is as follows: {query_str}\nWe have provided an existing answer: {existing_answer}\nWe have the opportunity to refine the existing answer (only if needed) with some more context below.\n------------\n{context_msg}\n------------\nGiven the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.\nRefined Answer: "), conditionals=[(<function is_chat_model at 0x73257fc71a80>, ChatPromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM: 'custom'>}, template_vars=['context_msg', 'query_str', 'existing_answer'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, message_templates=[ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text="You are an expert Q&A system that strictly operates in two modes when refining existing answers:\n1. **Rewrite** an original answer using the new context.\n2. **Repeat** the original answer if the new context isn't useful.\nNever reference the original answer or context directly in your answer.\nWhen in doubt, just repeat the original answer.\nNew Context: {context_msg}\nQuery: {query_str}\nOriginal Answer: {existing_answer}\nNew Answer: ")])]))])}
文中列举了以下例子：
1. 选择餐馆：利用是去已知味道好的餐馆，探索是尝试新餐馆。
2. 做广告：利用是采用已知最优广告策略，探索是尝试新广告策略。
3. 挖油：利用是在已知有油的地方开采，探索是在新地方钻探。
4. 玩游戏：利用是重复使用特定游戏策略（如《街头霸王》中蹲角落出脚），探索是尝试新招式。

练习

LangChain代码最终得到的输出携带了各种参数，查询相关资料尝试把这些参数过滤掉得到content里的具体回答。

# 使用AIMessage消息体自带属性==>answer.content
answer = llm.invoke(prompt.format(question=question, context=docs_content))
# print(answer)
print(answer.content)

-------------------------------
(all-in-rag) @WangDF2022 ➜ /workspaces/all-in-rag/code/C1 (main) $ python 01_langchain_example.py
根据上下文，文中举了以下例子：

1. **选择餐馆的例子**：
   - 利用：直接去最喜欢的餐馆，因为知道菜品可口。
   - 探索：用手机搜索新餐馆尝试，可能不满意而浪费钱。

2. **做广告的例子**：
   - 利用：直接采取最优广告策略。
   - 探索：尝试新广告策略，看是否能得到更好效果。

3. **挖油的例子**：
   - 利用：在已知地方挖油，确保挖到油。
   - 探索：在新地方挖油，可能失败也可能发现大油田。

4. **玩游戏的例子**（以《街头霸王》为例）：
   - 利用：总是采取一种策略（如蹲在角落一直出脚），可能奏效但遇到特定对手会失效。
   - 探索：尝试新招式（如放出“大招”），可能“一招毙命”。

5. **其他奖励的例子**：
   - 象棋选手：赢棋得正奖励，输棋得负奖励。
   - 股票管理：奖励由股票盈亏决定。
   - 雅达利游戏：奖励是游戏分数的增减。

6. **Gym库交互的例子**：
   - 以小车上山（MountainCar-v0）任务为例，说明观测空间、动作空间等概念。

这些例子用于解释强化学习中的探索与利用、奖励机制及环境交互等概念。

修改Langchain代码中RecursiveCharacterTextSplitter()的参数chunk_size和chunk_overlap，观察输出结果有什么变化。

chunk_size：决定每个文本块的大小。如果块太大，可能会包含过多无关信息，影响检索精度；如果块太小，可能会丢失上下文信息。
chunk_overlap：决定块与块之间的重叠大小。适当的重叠可以保持上下文的连贯性，避免重要信息被割裂。
尝试调整检索的top_k（例如从3增加到5），以检索更多文档块。

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500,chunk_overlap=300)
----------------------------------------------------------------
根据上下文，文中举了以下强化学习的例子：

1. **DeepMind 研发的走路的智能体**：智能体学习走路，通过举手保持平衡以更快前进，并能通过增加环境扰动提高鲁棒性。
2. **机械臂抓取**：使用多个机械臂通过强化学习训练统一的抓取算法，适用于不同形状的物体，避免传统方法对每个物体单独建模的耗时问题。
3. **OpenAI 的机械臂翻魔方**：机械臂在虚拟环境中通过强化学习训练后，能灵活操作魔方，并后续改进到能玩魔方。
4. **穿衣服的智能体**：训练强化学习智能体实现穿衣服的精细操作，并能抵抗环境扰动，尽管可能出现失败情况。

这些例子均用于说明强化学习在不同场景中的应用。

📊 结果对比

参数设置	检索到的例子类型	内容覆盖范围	具体性
`chunk_size=1000, chunk_overlap=200`	基础概念例子（探索与利用、奖励机制）	较窄，集中在理论概念	很具体，有详细分类
`chunk_size=1500, chunk_overlap=300`	实际应用例子（机器人控制、智能体）	较广，涵盖实际应用场景	相对宏观，缺少细节

分块策略需要根据具体任务和文档结构进行精心调优，没有"一刀切"的最佳参数。

给LlamaIndex代码添加代码注释。

import os
# 设置Hugging Face镜像端点（国内环境如无法访问可取消注释）
# os.environ['HF_ENDPOINT']='https://hf-mirror.com'

from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings 
from llama_index.llms.deepseek import DeepSeek
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 加载环境变量
load_dotenv()

# 配置LlamaIndex全局设置
# 设置LLM为DeepSeek模型，使用deepseek-chat版本
Settings.llm = DeepSeek(model="deepseek-chat", api_key=os.getenv("DEEPSEEK_API_KEY"))
# 设置嵌入模型为中文优化的BGE小型模型
Settings.embed_model = HuggingFaceEmbedding("BAAI/bge-small-zh-v1.5")

# 加载文档数据
# 使用SimpleDirectoryReader从指定markdown文件加载文档
docs = SimpleDirectoryReader(input_files=["../../data/C1/markdown/easy-rl-chapter1.md"]).load_data()

# 创建向量存储索引
# 基于加载的文档构建向量索引，自动进行文本分块和向量化
index = VectorStoreIndex.from_documents(docs)

# 创建查询引擎
# 将索引转换为可查询的引擎，用于后续的问答检索
query_engine = index.as_query_engine()

# 打印查询引擎使用的提示词模板
# 查看系统默认的提示词配置
print(query_engine.get_prompts())

# 执行查询并打印结果
# 向查询引擎提问"文中举了哪些例子?"并获取回答
print(query_engine.query("文中举了哪些例子?"))

问题

openai.APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient Balance', 'type': 'unknown_error', 'param': None, 'code': 'invalid_request_error'}}

deepseek-api没有余额

(all-in-rag) @WangDF2022 ➜ /workspaces/all-in-rag/code/C1 (main) $ python 01_langchain_example.py
content='抱歉，我无法根据提供的上下文找到相关信息来回答此问题。' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 5550, 'total_tokens': 5564, 'completion_tokens_details': None, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}, 'prompt_cache_hit_tokens': 0, 'prompt_cache_miss_tokens': 5550}, 'model_name': 'deepseek-chat', 'system_fingerprint': 'fp_ffc7281d48_prod0820_fp8_kvcache', 'id': 'ff53adb2-2d09-4db2-b272-e96b89240c8a', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None} id='run--d89b328d-29ba-418f-9523-7837de8c01f6-0' usage_metadata={'input_tokens': 5550, 'output_tokens': 14, 'total_tokens': 5564, 'input_token_details': {'cache_read': 0}, 'output_token_details': {}}

# 文本分块-原
text_splitter = RecursiveCharacterTextSplitter()
chunks = text_splitter.split_documents(docs)

# 文本分块-改
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)  # 最大token限制了 ，检索到的文档太长
chunks = text_splitter.split_documents(docs)
最大token限制了，检索到的文档太长

输入token + max_tokens≤模型的最大上下文长度

上下文+prompt就是输入token

max_tokens是用来限制模型输出的

常见问题

分块过大 → 信息冗余，检索不精准
- 解决：调整 chunk_size（建议500-1500）
分块过小 → 上下文断裂，理解困难
- 解决：增加 chunk_overlap（建议100-200）
检索数量不当 → k值过小漏信息，过大引入噪声
- 解决：根据问题复杂度调整k值（通常3-5）
提示词不明确 → LLM忽视检索内容
- 解决：强化提示词中的"基于上下文"要求

参考资料

🔗all-in-rag

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

2025最权威的十大AI辅助写作网站实际效果

2048 AI社区

免费AI写论文工具实测：6大智能平台推荐，查重率直降40%！

2048 AI社区

2025年AI IED实战测评榜:从个人开发到企业部署的完整选型攻略

AI编程工具已成为提升开发效率的关键因素，本文对比了10款主流工具，包括腾讯CodeBuddy、通义灵码等。CodeBuddy凭借混合模型架构、企业级合规和全流程提效能力，成为企业开发优选；通义灵码在阿里云生态表现突出；Replit Ghostwriter适合教学场景；Codeium是个人开发者免费首选。针对不同规模团队，文章给出了针对性建议：个人开发者可选CodeBuddy个人版或Codeium