【值得收藏】从RAG到CAG：缓存增强生成的完整实现，让AI拥有“记忆“能力

本文详细介绍了从传统RAG到CAG的技术演进，通过缓存机制解决RAG每次查询都需检索的问题。文章提供了完整代码实现，展示了CAG如何提高响应速度、降低成本，并介绍了智能缓存策略和更新机制。最后，文章还提供了生产级系统的实现方法和最佳实践，帮助开发者构建高效的知识增强AI系统。

小涂Ss

217人浏览 · 2025-11-08 19:37:30

小涂Ss · 2025-11-08 19:37:30 发布

不知道你有没有遇到这样的情况，AI客服每天要回答几千个问题，其中至少有三分之一是重复的——什么"年假怎么算"“差旅费怎么报销”“公积金比例是多少”……这些问题的答案其实都写在公司制度里，几个月都不会变一次。

但问题来了：每次有人问，AI都要重新去文档库里翻一遍。

就像你明明已经把家里钥匙放哪儿记得清清楚楚，但每次出门还是要把整个房间翻一遍才能找到。这不是浪费时间吗?

今天这篇文章，我会用实际代码带你完整实现从传统RAG到CAG的演进过程。每一步都有可运行的代码，让你真正理解这个技术是怎么work的。

一、RAG很好，但它有个"健忘症"

说到这里,得先聊聊现在最流行的RAG技术。

RAG全称是"检索增强生成",听起来挺学术的,但原理很直白:让AI在回答问题之前,先去知识库里查一查相关资料,然后基于这些资料来生成答案。

这个方法确实解决了AI"瞎编"的问题。但它有个天生的缺陷——没记性。

1.1 什么是RAG？

RAG（检索增强生成）的工作流程很简单：

用户提问
系统去知识库检索相关文档
把检索结果和问题一起给AI
AI基于检索内容生成答案

听起来很美好，但问题在于：每次都要检索。

1.2 传统RAG的完整实现

让我们先实现一个标准的RAG系统，用企业HR知识库作为例子：

import numpy as np
from typing import List, Dict
import time
from datetime import datetime
# 模拟向量数据库
class SimpleVectorDB:
"""简单的向量数据库实现"""
def __init__(self):
self.documents = []
self.embeddings = []
self.metadata = []
def add_document(self, text: str, metadata: Dict = None):
"""添加文档到数据库"""
# 这里用简单的词频向量模拟embedding
embedding = self._text_to_vector(text)
self.documents.append(text)
self.embeddings.append(embedding)
self.metadata.append(metadata or {})
def _text_to_vector(self, text: str) -> np.ndarray:
"""将文本转换为向量（简化版）"""
# 实际应该用OpenAI/HuggingFace的embedding模型
# 这里简化处理：基于字符出现频率
vector = np.zeros(100)
for i, char in enumerate(text[:100]):
vector[i] = ord(char) / 1000
return vector
def search(self, query: str, top_k: int = 3) -> List[Dict]:
"""检索最相关的文档"""
query_vector = self._text_to_vector(query)
# 计算余弦相似度
similarities = []
for i, doc_vector in enumerate(self.embeddings):
similarity = np.dot(query_vector, doc_vector) / (
np.linalg.norm(query_vector) * np.linalg.norm(doc_vector) + 1e-10
)
similarities.append({
'index': i,
'score': similarity,
'text': self.documents[i],
'metadata': self.metadata[i]
})
# 返回top_k结果
similarities.sort(key=lambda x: x['score'], reverse=True)
return similarities[:top_k]
class TraditionalRAG:
"""传统RAG系统"""
def __init__(self):
self.vector_db = SimpleVectorDB()
self.search_count = 0  # 统计检索次数
self.search_times = []  # 记录每次检索耗时
def add_knowledge(self, text: str, metadata: Dict = None):
"""添加知识到系统"""
self.vector_db.add_document(text, metadata)
def query(self, question: str) -> Dict:
"""处理查询"""
start_time = time.time()
# 每次都要检索
search_results = self.vector_db.search(question, top_k=2)
search_time = time.time() - start_time
self.search_count += 1
self.search_times.append(search_time)
# 组装上下文
context = "\n\n".join([r['text'] for r in search_results])
# 模拟LLM生成答案（实际应调用GPT/Claude API）
answer = self._generate_answer(question, context)
return {
'question': question,
'answer': answer,
'context': context,
'search_time': search_time,
'total_searches': self.search_count
}
def _generate_answer(self, question: str, context: str) -> str:
"""模拟LLM生成答案"""
# 实际应该调用OpenAI API或其他LLM
return f"基于知识库：{context[:100]}... 回答：[模拟答案]"
def get_statistics(self) -> Dict:
"""获取性能统计"""
return {
'total_searches': self.search_count,
'avg_search_time': np.mean(self.search_times) if self.search_times else 0,
'total_time': sum(self.search_times)
}
# 使用示例
def demo_traditional_rag():
"""演示传统RAG的问题"""
print("=" * 60)
print("传统RAG系统演示")
print("=" * 60)
# 创建RAG系统
rag = TraditionalRAG()
# 添加企业知识（这些都是稳定的制度文档）
knowledge_base = [
{
"text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。年假必须在当年使用，不可跨年累积。",
"metadata": {"category": "HR政策", "update_date": "2024-01-01"}
},
{
"text": "差旅费报销标准：国内出差每天补贴200元，住宿费实报实销上限500元/天。需提供发票和出差申请单。",
"metadata": {"category": "财务制度", "update_date": "2024-01-01"}
},
{
"text": "公积金缴纳比例：公司和个人各缴纳12%，基数为上年度月平均工资。每年7月调整一次。",
"metadata": {"category": "薪酬福利", "update_date": "2024-01-01"}
},
{
"text": "病假规定：员工因病需请假，需提供医院证明。病假工资按基本工资的80%发放，每年累计不超过30天。",
"metadata": {"category": "HR政策", "update_date": "2024-01-01"}
}
]
for kb in knowledge_base:
rag.add_knowledge(kb['text'], kb['metadata'])
print(f"\n已加载 {len(knowledge_base)} 条企业知识\n")
# 模拟重复查询（这是关键问题所在）
repeated_questions = [
"年假怎么算？",
"年假政策是什么？",
"我能休几天年假？",
"差旅费怎么报销？",
"出差补贴标准是多少？",
"年假能累积吗？",  # 又问年假
"公积金比例是多少？",
"年假政策详细说明",  # 再问年假
]
print("开始处理查询...\n")
for i, question in enumerate(repeated_questions, 1):
result = rag.query(question)
print(f"查询 {i}: {question}")
print(f"  检索耗时: {result['search_time']*1000:.2f}ms")
print(f"  累计检索次数: {result['total_searches']}")
print()
# 显示统计信息
stats = rag.get_statistics()
print("=" * 60)
print("性能统计")
print("=" * 60)
print(f"总检索次数: {stats['total_searches']}")
print(f"平均检索耗时: {stats['avg_search_time']*1000:.2f}ms")
print(f"总耗时: {stats['total_time']*1000:.2f}ms")
print()
print("⚠️  问题分析：")
print("  - 关于'年假'的问题被问了4次，但每次都重新检索")
print("  - 这些制度文档几个月都不会变，却要反复访问数据库")
print("  - 随着查询量增加，成本和延迟线性上升")
print()
# 运行演示
demo_traditional_rag()

1.3 问题暴露：成本与延迟

运行上面的代码，你会看到：

关于"年假"的问题问了4次，系统检索了4次
每次检索都要访问向量数据库
累计检索次数随查询量线性增长

实际生产环境的影响：

成本：向量数据库调用费用（如Pinecone按查询次数收费）
延迟：网络往返+相似度计算，通常50-200ms
资源：数据库连接数、CPU占用

通过上面的例子可以很清楚发现，就算是同样的问题问一百遍，AI还是会乖乖地去检索一百遍。访问数据库、匹配文档、提取信息……这一套流程走下来,既耗时又烧钱。

尤其是对于那些几乎不会变的知识,比如公司规章制度、产品说明书、法律条文……每次都重新检索,实在是有点"杀鸡用牛刀"的感觉。

二、CAG:给AI装上"内存条"

节节这个问题，有个新思路，叫做缓存增强生成(CAG)。

简单说,就是给AI装个"内存"——把那些稳定不变的知识,直接存到模型内部的记忆库里。下次再遇到相关问题,就不用去外面翻箱倒柜了,直接从"脑子里"调出来就行。

这就好比你把常用的工具放在手边,而不是每次都跑到仓库去找。

效果立竿见影:

速度更快:不用反复访问数据库,响应时间能缩短一大半
成本更低:检索次数少了,服务器压力小了,钱自然省下来了
回答更稳定:对于固定知识的表述更一致,不会今天说A明天说B

2.1 CAG的核心思想

CAG（缓存增强生成）要做的事情很简单：

识别哪些知识是"静态的"（长期不变）
把这些知识直接缓存到内存
查询时先查缓存，命中就不用检索了

那是不是所有知识都该塞进缓存呢?

当然不是。如果什么都往里装,很快就会把AI的"脑容量"撑爆。

2.2 CAG系统的完整代码实现

import hashlib
from typing import Optional, Tuple
import json
class KnowledgeCache:
"""知识缓存管理器"""
def __init__(self, max_size: int = 100):
self.cache = {}  # 缓存存储
self.max_size = max_size
self.hit_count = 0  # 命中次数
self.miss_count = 0  # 未命中次数
self.access_log = []  # 访问日志
def _generate_key(self, query: str) -> str:
"""生成查询的缓存键"""
# 使用语义哈希（这里简化为文本哈希）
# 实际应该用embedding的相似度匹配
normalized = query.lower().strip()
return hashlib.md5(normalized.encode()).hexdigest()[:16]
def get(self, query: str, similarity_threshold: float = 0.85) -> Optional[Dict]:
"""从缓存获取答案"""
# 简化版：精确匹配
# 实际应该用语义相似度匹配
query_key = query.lower().strip()
# 查找语义相似的缓存项
for cached_query, cached_data in self.cache.items():
if self._is_similar(query_key, cached_query):
self.hit_count += 1
self.access_log.append({
'query': query,
'result': 'HIT',
'timestamp': datetime.now().isoformat()
})
return cached_data
self.miss_count += 1
self.access_log.append({
'query': query,
'result': 'MISS',
'timestamp': datetime.now().isoformat()
})
return None
def _is_similar(self, query1: str, query2: str) -> bool:
"""判断两个查询是否相似"""
# 简化版：包含关键词就算相似
# 实际应该用向量相似度
keywords1 = set(query1.split())
keywords2 = set(query2.split())
if not keywords1 or not keywords2:
return False
intersection = keywords1 & keywords2
union = keywords1 | keywords2
similarity = len(intersection) / len(union)
return similarity > 0.5
def set(self, query: str, context: str, answer: str, metadata: Dict = None):
"""设置缓存"""
query_key = query.lower().strip()
# 检查容量限制
if len(self.cache) >= self.max_size:
# 简单的LRU：删除最旧的项
oldest_key = next(iter(self.cache))
del self.cache[oldest_key]
self.cache[query_key] = {
'query': query,
'context': context,
'answer': answer,
'metadata': metadata or {},
'cached_at': datetime.now().isoformat()
}
def get_statistics(self) -> Dict:
"""获取缓存统计"""
total_access = self.hit_count + self.miss_count
hit_rate = self.hit_count / total_access if total_access > 0 else 0
return {
'hit_count': self.hit_count,
'miss_count': self.miss_count,
'total_access': total_access,
'hit_rate': hit_rate,
'cache_size': len(self.cache)
}
class CAGSystem:
"""CAG（缓存增强生成）系统"""
def __init__(self, cache_size: int = 100):
self.vector_db = SimpleVectorDB()
self.cache = KnowledgeCache(max_size=cache_size)
self.search_count = 0
self.search_times = []
self.cache_hit_times = []
def add_knowledge(self, text: str, metadata: Dict = None, cacheable: bool = False):
"""添加知识"""
self.vector_db.add_document(text, metadata)
# 如果标记为可缓存，预先生成常见问题的缓存
if cacheable and metadata and 'common_questions' in metadata:
for question in metadata['common_questions']:
# 预先缓存答案
answer = f"基于缓存：{text[:100]}..."
self.cache.set(question, text, answer, metadata)
def query(self, question: str) -> Dict:
"""处理查询（带缓存）"""
start_time = time.time()
# 先查缓存
cached_result = self.cache.get(question)
if cached_result:
# 缓存命中！
cache_time = time.time() - start_time
self.cache_hit_times.append(cache_time)
return {
'question': question,
'answer': cached_result['answer'],
'context': cached_result['context'],
'source': 'CACHE',
'response_time': cache_time,
'total_searches': self.search_count
}
# 缓存未命中，执行检索
search_results = self.vector_db.search(question, top_k=2)
search_time = time.time() - start_time
self.search_count += 1
self.search_times.append(search_time)
# 组装上下文
context = "\n\n".join([r['text'] for r in search_results])
answer = self._generate_answer(question, context)
# 存入缓存（如果是静态知识）
if search_results and self._is_cacheable(search_results[0]):
self.cache.set(question, context, answer,
search_results[0].get('metadata', {}))
return {
'question': question,
'answer': answer,
'context': context,
'source': 'RETRIEVAL',
'response_time': search_time,
'total_searches': self.search_count
}
def _is_cacheable(self, search_result: Dict) -> bool:
"""判断检索结果是否应该缓存"""
metadata = search_result.get('metadata', {})
# 如果有更新日期且超过30天未更新，认为是静态知识
update_date = metadata.get('update_date')
if update_date:
# 简化判断：只要有update_date就认为是静态的
return True
return False
def _generate_answer(self, question: str, context: str) -> str:
"""模拟LLM生成答案"""
return f"基于知识库：{context[:100]}... 回答：[模拟答案]"
def get_statistics(self) -> Dict:
"""获取完整统计信息"""
cache_stats = self.cache.get_statistics()
return {
'retrieval': {
'total_searches': self.search_count,
'avg_search_time': np.mean(self.search_times) if self.search_times else 0,
'total_time': sum(self.search_times)
},
'cache': {
'hit_count': cache_stats['hit_count'],
'miss_count': cache_stats['miss_count'],
'hit_rate': cache_stats['hit_rate'],
'avg_hit_time': np.mean(self.cache_hit_times) if self.cache_hit_times else 0,
'cache_size': cache_stats['cache_size']
},
'overall': {
'total_queries': cache_stats['total_access'],
'searches_saved': cache_stats['hit_count'],
'cost_reduction': f"{cache_stats['hit_rate']*100:.1f}%"
}
}
# 使用示例
def demo_cag_system():
"""演示CAG系统的优势"""
print("=" * 60)
print("CAG系统演示（带缓存优化）")
print("=" * 60)
# 创建CAG系统
cag = CAGSystem(cache_size=50)
# 添加知识（标记静态知识为可缓存）
knowledge_base = [
{
"text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。年假必须在当年使用，不可跨年累积。",
"metadata": {
"category": "HR政策",
"update_date": "2024-01-01",
"common_questions": [
"年假怎么算",
"年假政策是什么",
"我能休几天年假",
"年假能累积吗"
]
},
"cacheable": True
},
{
"text": "差旅费报销标准：国内出差每天补贴200元，住宿费实报实销上限500元/天。需提供发票和出差申请单。",
"metadata": {
"category": "财务制度",
"update_date": "2024-01-01",
"common_questions": [
"差旅费怎么报销",
"出差补贴标准是多少",
"出差住宿费报销"
]
},
"cacheable": True
},
{
"text": "公积金缴纳比例：公司和个人各缴纳12%，基数为上年度月平均工资。每年7月调整一次。",
"metadata": {
"category": "薪酬福利",
"update_date": "2024-01-01",
"common_questions": [
"公积金比例是多少",
"公积金怎么缴纳"
]
},
"cacheable": True
}
]
for kb in knowledge_base:
cag.add_knowledge(kb['text'], kb['metadata'], kb['cacheable'])
print(f"\n已加载 {len(knowledge_base)} 条企业知识（已预缓存常见问题）\n")
# 模拟重复查询
test_questions = [
"年假怎么算？",          # 第1次：缓存命中
"年假政策是什么？",      # 第2次：缓存命中
"我能休几天年假？",      # 第3次：缓存命中
"差旅费怎么报销？",      # 第1次：缓存命中
"出差补贴标准是多少？",  # 第2次：缓存命中
"年假能累积吗？",        # 第4次：缓存命中
"公积金比例是多少？",    # 第1次：缓存命中
"年假政策详细说明",      # 第5次：缓存命中
]
print("开始处理查询...\n")
for i, question in enumerate(test_questions, 1):
result = cag.query(question)
# 显示结果
source_icon = "⚡ [缓存]" if result['source'] == 'CACHE' else "🔍 [检索]"
print(f"查询 {i}: {question}")
print(f"  数据源: {source_icon}")
print(f"  响应时间: {result['response_time']*1000:.2f}ms")
print(f"  累计检索次数: {result['total_searches']}")
print()
# 显示详细统计
stats = cag.get_statistics()
print("=" * 60)
print("性能统计对比")
print("=" * 60)
print("\n【检索统计】")
print(f"  实际检索次数: {stats['retrieval']['total_searches']}")
print(f"  平均检索耗时: {stats['retrieval']['avg_search_time']*1000:.2f}ms")
print("\n【缓存统计】")
print(f"  缓存命中次数: {stats['cache']['hit_count']}")
print(f"  缓存未命中: {stats['cache']['miss_count']}")
print(f"  缓存命中率: {stats['cache']['hit_rate']*100:.1f}%")
print(f"  平均缓存响应: {stats['cache']['avg_hit_time']*1000:.2f}ms")
print(f"  当前缓存大小: {stats['cache']['cache_size']}")
print("\n【整体优化】")
print(f"  总查询次数: {stats['overall']['total_queries']}")
print(f"  节省检索次数: {stats['overall']['searches_saved']}")
print(f"  成本降低: {stats['overall']['cost_reduction']}")
print("\n✅ 优势总结：")
print("  - 重复问题直接从缓存返回，无需检索")
print("  - 响应时间从 50-200ms 降低到 <5ms")
print("  - 数据库访问次数大幅减少，成本显著降低")
print()
# 运行CAG演示
demo_cag_system()

2.3 CAG与RAG的性能对比

让我们直接对比两个系统：

def compare_rag_vs_cag():
"""直接对比RAG和CAG的性能"""
print("=" * 60)
print("RAG vs CAG 性能对比实验")
print("=" * 60)
# 准备测试数据
knowledge = {
"text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。",
"metadata": {
"category": "HR政策",
"common_questions": ["年假怎么算", "年假政策", "休几天年假"]
}
}
# 重复查询100次
questions = ["年假怎么算？"] * 100
# 测试传统RAG
print("\n【测试1：传统RAG】")
rag = TraditionalRAG()
rag.add_knowledge(knowledge['text'], knowledge['metadata'])
rag_start = time.time()
for q in questions:
rag.query(q)
rag_total_time = time.time() - rag_start
rag_stats = rag.get_statistics()
print(f"总耗时: {rag_total_time*1000:.2f}ms")
print(f"检索次数: {rag_stats['total_searches']}")
print(f"平均延迟: {rag_stats['avg_search_time']*1000:.2f}ms")
# 测试CAG
print("\n【测试2：CAG系统】")
cag = CAGSystem()
cag.add_knowledge(knowledge['text'], knowledge['metadata'], cacheable=True)
cag_start = time.time()
for q in questions:
cag.query(q)
cag_total_time = time.time() - cag_start
cag_stats = cag.get_statistics()
print(f"总耗时: {cag_total_time*1000:.2f}ms")
print(f"检索次数: {cag_stats['retrieval']['total_searches']}")
print(f"缓存命中率: {cag_stats['cache']['hit_rate']*100:.1f}%")
print(f"平均延迟: {cag_stats['cache']['avg_hit_time']*1000:.2f}ms")
# 性能提升计算
print("\n" + "=" * 60)
print("性能提升")
print("=" * 60)
speedup = rag_total_time / cag_total_time
search_reduction = (rag_stats['total_searches'] - cag_stats['retrieval']['total_searches']) / rag_stats['total_searches']
print(f"速度提升: {speedup:.1f}x")
print(f"检索次数减少: {search_reduction*100:.1f}%")
print(f"成本节约: ~{search_reduction*100:.1f}%")
# 运行对比测试
compare_rag_vs_cag()

三、RAG+CAG融合架构

3.1 为什么需要融合？

就像你的大脑:九九乘法表、家庭住址这些早就记住了,但今天午饭吃什么、明天天气怎么样,还是得现查。

这种"内存+外脑"的双引擎模式,才是未来知识型AI的标配。

class HybridRAGCAG:
"""混合RAG+CAG系统"""
def __init__(self, cache_size: int = 100):
# 静态知识缓存
self.static_cache = KnowledgeCache(max_size=cache_size)
# 动态知识向量库
self.dynamic_db = SimpleVectorDB()
# 静态知识向量库（用于缓存未命中时的后备）
self.static_db = SimpleVectorDB()
# 统计信息
self.stats = {
'static_cache_hits': 0,
'static_db_queries': 0,
'dynamic_db_queries': 0,
'total_queries': 0
}
def add_static_knowledge(self, text: str, metadata: Dict = None,
common_questions: List[str] = None):
"""添加静态知识（长期不变）"""
# 添加到静态数据库
self.static_db.add_document(text, metadata)
# 预缓存常见问题
if common_questions:
for question in common_questions:
answer = f"[静态知识] {text}"
self.static_cache.set(question, text, answer, metadata)
def add_dynamic_knowledge(self, text: str, metadata: Dict = None):
"""添加动态知识（经常更新）"""
# 只添加到动态数据库，不缓存
self.dynamic_db.add_document(text, metadata)
def query(self, question: str, require_realtime: bool = False) -> Dict:
"""智能查询：自动判断用缓存还是检索
Args:
question: 用户问题
require_realtime: 是否强制要求实时数据
"""
self.stats['total_queries'] += 1
start_time = time.time()
# 如果不要求实时数据，先查静态缓存
if not require_realtime:
cached_result = self.static_cache.get(question)
if cached_result:
self.stats['static_cache_hits'] += 1
return {
'question': question,
'answer': cached_result['answer'],
'source': 'STATIC_CACHE',
'response_time': time.time() - start_time,
'confidence': 'high'
}
# 判断问题类型：需要动态数据还是静态数据？
question_type = self._classify_question(question)
if question_type == 'dynamic' or require_realtime:
# 查询动态数据库
results = self.dynamic_db.search(question, top_k=2)
self.stats['dynamic_db_queries'] += 1
source = 'DYNAMIC_RETRIEVAL'
else:
# 查询静态数据库
results = self.static_db.search(question, top_k=2)
self.stats['static_db_queries'] += 1
source = 'STATIC_RETRIEVAL'
# 将结果缓存起来，下次直接用
if results:
context = results[0]['text']
answer = f"[静态知识] {context}"
self.static_cache.set(question, context, answer,
results[0].get('metadata', {}))
# 生成答案
context = "\n".join([r['text'] for r in results]) if results else ""
answer = self._generate_answer(question, context, source)
return {
'question': question,
'answer': answer,
'source': source,
'response_time': time.time() - start_time,
'confidence': 'high' if results else 'low'
}
def _classify_question(self, question: str) -> str:
"""判断问题需要动态数据还是静态数据"""
# 简化版：通过关键词判断
dynamic_keywords = ['今天', '最新', '现在', '当前', '实时', '昨天', '最近']
static_keywords = ['政策', '制度', '规定', '标准', '流程', '怎么', '如何']
question_lower = question.lower()
# 包含动态关键词，返回dynamic
for keyword in dynamic_keywords:
if keyword in question_lower:
return 'dynamic'
# 包含静态关键词，返回static
for keyword in static_keywords:
if keyword in question_lower:
return 'static'
# 默认当作静态
return 'static'
def _generate_answer(self, question: str, context: str, source: str) -> str:
"""生成答案"""
if not context:
return "抱歉，没有找到相关信息。"
return f"基于{source}：{context[:150]}..."
def get_statistics(self) -> Dict:
"""获取详细统计"""
cache_stats = self.static_cache.get_statistics()
total_queries = self.stats['total_queries']
return {
'total_queries': total_queries,
'static_cache_hits': self.stats['static_cache_hits'],
'static_db_queries': self.stats['static_db_queries'],
'dynamic_db_queries': self.stats['dynamic_db_queries'],
'cache_hit_rate': cache_stats['hit_rate'],
'db_access_rate': (self.stats['static_db_queries'] + self.stats['dynamic_db_queries']) / total_queries if total_queries > 0 else 0
}
def demo_hybrid_system():
"""演示混合系统"""
print("=" * 60)
print("混合RAG+CAG系统演示")
print("=" * 60)
# 创建混合系统
hybrid = HybridRAGCAG(cache_size=50)
# 添加静态知识（制度文档）
print("\n【加载静态知识】")
static_knowledge = [
{
"text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。年假必须在当年使用，不可跨年累积。",
"common_questions": ["年假怎么算", "年假政策", "休几天年假", "年假能累积吗"]
},
{
"text": "报销流程：提交申请单→部门主管审批→财务审核→财务打款。处理时间约3-5个工作日。",
"common_questions": ["怎么报销", "报销流程", "报销要多久"]
}
]
for kb in static_knowledge:
hybrid.add_static_knowledge(
kb['text'],
{'type': 'static', 'category': 'policy'},
kb['common_questions']
)
print(f"已加载 {len(static_knowledge)} 条静态知识（已预缓存）")
# 添加动态知识（实时数据）
print("\n【加载动态知识】")
dynamic_knowledge = [
{
"text": "今天公司食堂菜单：午餐有红烧肉、清蒸鱼、麻婆豆腐。晚餐有宫保鸡丁、酸菜鱼、素炒时蔬。",
"metadata": {'type': 'dynamic', 'date': '2025-11-05'}
},
{
"text": "本周会议通知：周三下午3点全体会议，周五上午10点部门例会。请提前准备材料。",
"metadata": {'type': 'dynamic', 'date': '2025-11-05'}
}
]
for kb in dynamic_knowledge:
hybrid.add_dynamic_knowledge(kb['text'], kb['metadata'])
print(f"已加载 {len(dynamic_knowledge)} 条动态知识")
# 测试不同类型的查询
print("\n" + "=" * 60)
print("开始测试查询")
print("=" * 60)
test_cases = [
{"q": "年假怎么算？", "type": "静态问题（应命中缓存）"},
{"q": "年假政策是什么？", "type": "静态问题（应命中缓存）"},
{"q": "报销流程是什么？", "type": "静态问题（应命中缓存）"},
{"q": "今天食堂吃什么？", "type": "动态问题（应查询动态库）"},
{"q": "本周有什么会议？", "type": "动态问题（应查询动态库）"},
{"q": "年假能累积吗？", "type": "静态问题（应命中缓存）"},
{"q": "今天食堂有什么菜？", "type": "动态问题（应查询动态库）"},
]
for i, test in enumerate(test_cases, 1):
result = hybrid.query(test['q'])
# 根据source显示不同图标
if result['source'] == 'STATIC_CACHE':
icon = "⚡"
color = "缓存"
elif result['source'] == 'STATIC_RETRIEVAL':
icon = "📚"
color = "静态库"
else:
icon = "🔄"
color = "动态库"
print(f"\n查询 {i}: {test['q']}")
print(f"  类型: {test['type']}")
print(f"  数据源: {icon} {color}")
print(f"  响应时间: {result['response_time']*1000:.2f}ms")
# 显示统计
print("\n" + "=" * 60)
print("系统统计")
print("=" * 60)
stats = hybrid.get_statistics()
print(f"\n总查询次数: {stats['total_queries']}")
print(f"  ├─ 静态缓存命中: {stats['static_cache_hits']} ({stats['cache_hit_rate']*100:.1f}%)")
print(f"  ├─ 静态库查询: {stats['static_db_queries']}")
print(f"  └─ 动态库查询: {stats['dynamic_db_queries']}")
print(f"\n数据库访问率: {stats['db_access_rate']*100:.1f}%")
print(f"成本节约: ~{(1-stats['db_access_rate'])*100:.1f}%")
print("\n✅ 混合架构优势：")
print("  - 静态知识走缓存，响应极快")
print("  - 动态知识走检索，保证实时性")
print("  - 自动判断问题类型，智能路由")
print("  - 兼顾速度、成本和准确性")
# 运行混合系统演示
demo_hybrid_system()

四、怎么判断该缓存什么?

4.1 为什么需要选择性缓存？

不是所有知识都该缓存。如果乱缓存，会遇到两个问题：

内存爆炸：缓存太多，占用大量内存
命中率低：缓存了不常用的内容，浪费空间

所以需要一套智能缓存策略。

4.2 基于访问频率的智能缓存

class SmartCache:
"""智能缓存系统（基于LFU+LRU）"""
def __init__(self, max_size: int = 100, min_access_count: int = 3):
self.cache = {}
self.access_count = {}  # 访问计数
self.last_access = {}   # 最后访问时间
self.max_size = max_size
self.min_access_count = min_access_count  # 最小访问次数才缓存
# 候选池：访问次数不够的暂存这里
self.candidate_pool = {}
self.candidate_access = {}
def should_cache(self, key: str) -> bool:
"""判断是否应该缓存"""
# 如果已经在候选池
if key in self.candidate_access:
self.candidate_access[key] += 1
# 访问次数达到阈值，提升到正式缓存
if self.candidate_access[key] >= self.min_access_count:
return True
else:
# 首次访问，加入候选池
self.candidate_access[key] = 1
return False
def set(self, key: str, value: Dict):
"""设置缓存（只缓存热数据）"""
if not self.should_cache(key):
# 暂存到候选池
self.candidate_pool[key] = value
return False
# 达到缓存条件，正式缓存
if len(self.cache) >= self.max_size:
# 淘汰策略：LFU + LRU
self._evict()
self.cache[key] = value
self.access_count[key] = self.candidate_access.get(key, 1)
self.last_access[key] = time.time()
# 从候选池移除
if key in self.candidate_pool:
del self.candidate_pool[key]
return True
def get(self, key: str) -> Optional[Dict]:
"""获取缓存"""
if key in self.cache:
# 更新访问统计
self.access_count[key] += 1
self.last_access[key] = time.time()
return self.cache[key]
# 检查候选池
if key in self.candidate_pool:
self.candidate_access[key] += 1
# 如果访问够多了，提升到正式缓存
if self.candidate_access[key] >= self.min_access_count:
self.set(key, self.candidate_pool[key])
return self.cache[key]
return self.candidate_pool[key]
return None
def _evict(self):
"""淘汰缓存项（LFU + LRU组合）"""
if not self.cache:
return
# 找出访问次数最少的项
min_count = min(self.access_count.values())
candidates = [k for k, v in self.access_count.items() if v == min_count]
# 如果有多个，选最久未访问的
if len(candidates) > 1:
evict_key = min(candidates, key=lambda k: self.last_access[k])
else:
evict_key = candidates[0]
# 删除
del self.cache[evict_key]
del self.access_count[evict_key]
del self.last_access[evict_key]
def get_statistics(self) -> Dict:
"""获取统计信息"""
return {
'cache_size': len(self.cache),
'candidate_size': len(self.candidate_pool),
'total_size': len(self.cache) + len(self.candidate_pool),
'avg_access_count': np.mean(list(self.access_count.values())) if self.access_count else 0,
'hot_items': sorted(
[(k, v) for k, v in self.access_count.items()],
key=lambda x: x[1],
reverse=True
)[:5]  # 前5个热门项
}
class SmartCachingSystem:
"""带智能缓存的完整系统"""
def __init__(self, cache_size: int = 50, min_access: int = 3):
self.vector_db = SimpleVectorDB()
self.smart_cache = SmartCache(max_size=cache_size, min_access_count=min_access)
self.stats = {
'total_queries': 0,
'cache_hits': 0,
'db_queries': 0,
'promoted_to_cache': 0  # 从候选池提升到正式缓存的次数
}
def add_knowledge(self, text: str, metadata: Dict = None):
"""添加知识"""
self.vector_db.add_document(text, metadata)
def query(self, question: str) -> Dict:
"""查询"""
self.stats['total_queries'] += 1
start_time = time.time()
# 查缓存
cached = self.smart_cache.get(question)
if cached and question in self.smart_cache.cache:  # 正式缓存命中
self.stats['cache_hits'] += 1
return {
'question': question,
'answer': cached['answer'],
'source': 'CACHE',
'response_time': time.time() - start_time
}
# 检索
results = self.vector_db.search(question, top_k=2)
self.stats['db_queries'] += 1
context = "\n".join([r['text'] for r in results]) if results else ""
answer = f"基于检索: {context[:100]}..."
# 尝试缓存（智能判断）
cached_result = self.smart_cache.set(question, {
'answer': answer,
'context': context,
'metadata': results[0].get('metadata', {}) if results else {}
})
if cached_result:
self.stats['promoted_to_cache'] += 1
return {
'question': question,
'answer': answer,
'source': 'RETRIEVAL',
'response_time': time.time() - start_time,
'will_cache': cached_result
}
def get_statistics(self) -> Dict:
"""获取统计"""
cache_stats = self.smart_cache.get_statistics()
return {
'queries': self.stats,
'cache': cache_stats,
'cache_hit_rate': self.stats['cache_hits'] / self.stats['total_queries'] if self.stats['total_queries'] > 0 else 0
}
def demo_smart_caching():
"""演示智能缓存"""
print("=" * 60)
print("智能缓存系统演示")
print("=" * 60)
# 创建系统
system = SmartCachingSystem(cache_size=10, min_access=3)
# 添加知识
knowledge = [
"年假政策：入职满1年5天，满3年10天，满5年15天",
"报销流程：提交申请→审批→财务审核→打款",
"公积金比例：公司和个人各12%",
"加班政策：工作日1.5倍，周末2倍，节假日3倍",
"社保缴纳：养老8%医疗2%失业0.5%"
]
for kb in knowledge:
system.add_knowledge(kb)
print(f"\n已加载 {len(knowledge)} 条知识\n")
# 模拟真实查询分布（符合二八定律）
print("模拟真实查询场景（80%查询集中在20%的问题）\n")
# 热门问题（会被频繁查询）
hot_questions = [
"年假怎么算",
"怎么报销",
"公积金比例"
]
# 冷门问题（偶尔查一次）
cold_questions = [
"加班怎么算",
"社保比例",
"病假政策",
"迟到扣款",
"离职流程"
]
# 生成查询序列（80/20分布）
query_sequence = []
for _ in range(50):
if np.random.random() < 0.8:  # 80%概率查热门问题
query_sequence.append(np.random.choice(hot_questions))
else:  # 20%概率查冷门问题
query_sequence.append(np.random.choice(cold_questions))
# 执行查询
print("开始处理50次查询...\n")
cache_hits_timeline = []
for i, question in enumerate(query_sequence, 1):
result = system.query(question)
if i <= 10 or i % 10 == 0:  # 只显示部分结果
source_icon = "⚡" if result['source'] == 'CACHE' else "🔍"
cached_tag = " [已提升到缓存]" if result.get('will_cache') else ""
print(f"查询{i:2d}: {question:15s} {source_icon} {result['source']}{cached_tag}")
# 记录命中率变化
stats = system.get_statistics()
cache_hits_timeline.append(stats['cache_hit_rate'])
# 最终统计
print("\n" + "=" * 60)
print("最终统计")
print("=" * 60)
final_stats = system.get_statistics()
print(f"\n【查询统计】")
print(f"  总查询次数: {final_stats['queries']['total_queries']}")
print(f"  缓存命中: {final_stats['queries']['cache_hits']}")
print(f"  数据库查询: {final_stats['queries']['db_queries']}")
print(f"  提升到缓存: {final_stats['queries']['promoted_to_cache']}")
print(f"\n【缓存统计】")
print(f"  正式缓存: {final_stats['cache']['cache_size']} 项")
print(f"  候选池: {final_stats['cache']['candidate_size']} 项")
print(f"  总存储: {final_stats['cache']['total_size']} 项")
print(f"  缓存命中率: {final_stats['cache_hit_rate']*100:.1f}%")
print(f"  平均访问次数: {final_stats['cache']['avg_access_count']:.1f}")
print(f"\n【热门问题Top5】")
for i, (question, count) in enumerate(final_stats['cache']['hot_items'], 1):
print(f"  {i}. {question} - 访问{count}次")
print("\n✅ 智能缓存特点：")
print("  - 只缓存被多次访问的热门问题（访问≥3次）")
print("  - 冷门问题不占用宝贵的缓存空间")
print("  - 自动淘汰不常用的缓存项")
print("  - 符合真实业务场景的访问分布")
# 显示命中率趋势
print(f"\n【命中率趋势】前20次查询：")
for i in range(0, min(20, len(cache_hits_timeline)), 5):
rate = cache_hits_timeline[i]
bar = "█" * int(rate * 50)
print(f"  查询{i+1:2d}: {bar} {rate*100:.1f}%")
# 运行智能缓存演示
demo_smart_caching()

五、缓存更新策略

5.1 如何处理知识更新？

静态知识也会更新，比如：

公司政策调整
产品信息变更
法律法规修订

这时需要缓存失效机制。

5.2 完整的缓存更新实现

from datetime import datetime, timedelta
class CacheWithTTL:
"""带过期时间的缓存"""
def __init__(self, max_size: int = 100, default_ttl: int = 86400):
"""
Args:
max_size: 最大缓存数量
default_ttl: 默认过期时间（秒），默认24小时
"""
self.cache = {}
self.max_size = max_size
self.default_ttl = default_ttl
self.stats = {
'hits': 0,
'misses': 0,
'expires': 0,
'invalidations': 0
}
def set(self, key: str, value: Dict, ttl: Optional[int] = None):
"""设置缓存项
Args:
key: 缓存键
value: 缓存值
ttl: 过期时间（秒），None则使用默认值
"""
if len(self.cache) >= self.max_size:
self._evict_oldest()
expire_at = time.time() + (ttl if ttl is not None else self.default_ttl)
self.cache[key] = {
'value': value,
'expire_at': expire_at,
'created_at': time.time(),
'version': value.get('metadata', {}).get('version', 1)
}
def get(self, key: str) -> Optional[Dict]:
"""获取缓存项"""
if key not in self.cache:
self.stats['misses'] += 1
return None
item = self.cache[key]
# 检查是否过期
if time.time() > item['expire_at']:
self.stats['expires'] += 1
del self.cache[key]
return None
self.stats['hits'] += 1
return item['value']
def invalidate(self, key: str):
"""主动失效某个缓存"""
if key in self.cache:
del self.cache[key]
self.stats['invalidations'] += 1
def invalidate_by_pattern(self, pattern: str):
"""按模式批量失效"""
keys_to_delete = [k for k in self.cache.keys() if pattern in k]
for key in keys_to_delete:
self.invalidate(key)
def update_version(self, key: str, new_version: int):
"""更新版本号（触发重新缓存）"""
if key in self.cache:
current_version = self.cache[key]['version']
if new_version > current_version:
# 版本更新，失效旧缓存
self.invalidate(key)
def _evict_oldest(self):
"""淘汰最旧的项"""
if not self.cache:
return
oldest_key = min(self.cache.keys(),
key=lambda k: self.cache[k]['created_at'])
del self.cache[oldest_key]
def get_statistics(self) -> Dict:
"""获取统计"""
total = self.stats['hits'] + self.stats['misses']
return {
**self.stats,
'hit_rate': self.stats['hits'] / total if total > 0 else 0,
'cache_size': len(self.cache)
}
class VersionedKnowledgeBase:
"""带版本控制的知识库"""
def __init__(self):
self.documents = {}  # key: doc_id, value: {content, version, metadata}
self.cache = CacheWithTTL(max_size=50, default_ttl=3600)  # 1小时TTL
self.vector_db = SimpleVectorDB()
def add_or_update_document(self, doc_id: str, content: str,
metadata: Dict = None, version: int = 1):
"""添加或更新文档"""
# 检查是否是更新
is_update = doc_id in self.documents
old_version = self.documents[doc_id]['version'] if is_update else 0
# 保存文档
self.documents[doc_id] = {
'content': content,
'version': version,
'metadata': metadata or {},
'updated_at': datetime.now().isoformat()
}
# 更新向量数据库
self.vector_db.add_document(content, {
'doc_id': doc_id,
'version': version,
**(metadata or {})
})
# 如果是更新，失效相关缓存
if is_update and version > old_version:
print(f"📝 文档 {doc_id} 更新: v{old_version} -> v{version}")
self.cache.invalidate_by_pattern(doc_id)
return True
return False
def query(self, question: str, doc_id: Optional[str] = None) -> Dict:
"""查询（支持版本检查）"""
# 构建缓存键
cache_key = f"{doc_id}:{question}" if doc_id else question
# 查缓存
cached = self.cache.get(cache_key)
if cached:
return {
'question': question,
'answer': cached['answer'],
'source': 'CACHE',
'version': cached.get('version', 'unknown')
}
# 检索
results = self.vector_db.search(question, top_k=2)
if not results:
return {'question': question, 'answer': '未找到相关信息', 'source': 'NONE'}
# 生成答案
context = results[0]['text']
result_doc_id = results[0]['metadata'].get('doc_id', 'unknown')
result_version = results[0]['metadata'].get('version', 1)
answer = f"[v{result_version}] {context}"
# 缓存结果
self.cache.set(cache_key, {
'answer': answer,
'context': context,
'metadata': {
'doc_id': result_doc_id,
'version': result_version
}
}, ttl=3600)  # 1小时过期
return {
'question': question,
'answer': answer,
'source': 'RETRIEVAL',
'version': result_version,
'doc_id': result_doc_id
}
def demo_cache_update():
"""演示缓存更新机制"""
print("=" * 60)
print("缓存更新与版本控制演示")
print("=" * 60)
kb = VersionedKnowledgeBase()
# 场景1：初始知识
print("\n【场景1：初始加载知识】")
kb.add_or_update_document(
doc_id="policy_annual_leave",
content="年假政策v1：入职满1年5天，满3年10天，满5年15天",
metadata={'category': 'HR'},
version=1
)
print("✅ 已添加：年假政策 v1")
# 第一次查询
print("\n第1次查询：年假怎么算？")
result1 = kb.query("年假怎么算")
print(f"  来源: {result1['source']}")
print(f"  版本: {result1['version']}")
print(f"  答案: {result1['answer'][:50]}...")
# 第二次查询（应该命中缓存）
print("\n第2次查询：年假怎么算？")
result2 = kb.query("年假怎么算")
print(f"  来源: {result2['source']} ⚡")
print(f"  版本: {result2['version']}")
# 场景2：政策更新
print("\n" + "=" * 60)
print("【场景2：政策更新】")
print("公司调整年假政策...")
kb.add_or_update_document(
doc_id="policy_annual_leave",
content="年假政策v2：入职满1年7天，满3年12天，满5年20天。新增：满10年25天",
metadata={'category': 'HR'},
version=2
)
# 再次查询（缓存已失效，应该返回新版本）
print("\n第3次查询：年假怎么算？")
result3 = kb.query("年假怎么算")
print(f"  来源: {result3['source']}")
print(f"  版本: {result3['version']}")
print(f"  答案: {result3['answer'][:60]}...")
# 场景3：缓存重建
print("\n第4次查询：年假怎么算？")
result4 = kb.query("年假怎么算")
print(f"  来源: {result4['source']} ⚡ (新版本已缓存)")
print(f"  版本: {result4['version']}")
# 统计
print("\n" + "=" * 60)
print("缓存统计")
print("=" * 60)
stats = kb.cache.get_statistics()
print(f"缓存命中: {stats['hits']}")
print(f"缓存未命中: {stats['misses']}")
print(f"缓存失效: {stats['invalidations']}")
print(f"命中率: {stats['hit_rate']*100:.1f}%")
print("\n✅ 更新机制总结：")
print("  - 文档更新时自动失效相关缓存")
print("  - 版本号控制确保数据一致性")
print("  - 支持TTL自动过期")
print("  - 下次查询会获取最新版本并重新缓存")
# 运行缓存更新演示
demo_cache_update()

5.3 三种缓存失效策略对比

def compare_invalidation_strategies():
"""对比不同的缓存失效策略"""
print("=" * 60)
print("三种缓存失效策略对比")
print("=" * 60)
print("\n【策略1：固定TTL（Time To Live）】")
print("特点：设置固定过期时间")
print("优点：实现简单，自动清理")
print("缺点：可能返回过期数据")
print("适用：可以容忍短期延迟的场景")
print("\n示例代码：")
print("""
cache.set('question', answer, ttl=3600)  # 1小时后过期
""")
print("\n【策略2：版本号控制】")
print("特点：每次更新增加版本号")
print("优点：精确控制，不会返回旧数据")
print("缺点：需要维护版本号系统")
print("适用：数据一致性要求高的场景")
print("\n示例代码：")
print("""
# 更新文档时
doc.version += 1
cache.invalidate_by_version(doc.id, doc.version)
""")
print("\n【策略3：主动推送失效】")
print("特点：内容更新时主动通知缓存失效")
print("优点：实时性最好")
print("缺点：需要额外的通知机制")
print("适用：分布式系统、多节点部署")
print("\n示例代码：")
print("""
# 发布更新事件
event_bus.publish('document_updated', doc_id='policy_123')
# 监听器失效缓存
@event_bus.subscribe('document_updated')
def on_document_updated(doc_id):
cache.invalidate_by_pattern(doc_id)
""")
# 实际测试对比
print("\n" + "=" * 60)
print("实际场景测试")
print("=" * 60)
# 模拟：文档每小时更新一次，查询每分钟一次
ttl_configs = [
{'name': 'TTL=10分钟', 'ttl': 600, 'update_interval': 3600},
{'name': 'TTL=30分钟', 'ttl': 1800, 'update_interval': 3600},
{'name': 'TTL=60分钟', 'ttl': 3600, 'update_interval': 3600},
]
print("\n假设：文档每小时更新，查询每分钟一次（共120次查询）")
print("\n不同TTL配置的效果：\n")
for config in ttl_configs:
ttl = config['ttl']
update_interval = config['update_interval']
# 计算可能返回过期数据的次数
stale_responses = max(0, (update_interval - ttl) / 60)  # 分钟
freshness_rate = (60 - stale_responses) / 60 * 100
print(f"{config['name']}:")
print(f"  可能过期的响应: ~{int(stale_responses)}次")
print(f"  数据新鲜度: {freshness_rate:.1f}%")
print()
print("💡 建议：")
print("  - 制度类文档：TTL = 24小时 + 版本控制")
print("  - 产品信息：TTL = 1小时 + 版本控制")
print("  - 实时数据：不缓存或TTL < 5分钟")
# 运行对比
compare_invalidation_strategies()

六、生产级实现与最佳实践

6.1 完整的生产级CAG系统

import logging
from typing import Callable
from dataclasses import dataclass
from enum import Enum
class CacheStrategy(Enum):
"""缓存策略"""
ALWAYS = "always"  # 总是缓存
SMART = "smart"    # 智能判断
NEVER = "never"    # 从不缓存
@dataclass
class CacheConfig:
"""缓存配置"""
max_size: int = 100
default_ttl: int = 3600
min_access_count: int = 3
strategy: CacheStrategy = CacheStrategy.SMART
enable_metrics: bool = True
class ProductionCAGSystem:
"""生产级CAG系统"""
def __init__(self, config: CacheConfig = None):
self.config = config or CacheConfig()
# 核心组件
self.static_cache = CacheWithTTL(
max_size=self.config.max_size,
default_ttl=self.config.default_ttl
)
self.smart_cache = SmartCache(
max_size=self.config.max_size,
min_access_count=self.config.min_access_count
)
self.vector_db = SimpleVectorDB()
# 监控指标
self.metrics = {
'total_queries': 0,
'cache_hits': 0,
'db_queries': 0,
'avg_response_time': [],
'errors': 0
}
# 日志
self.logger = self._setup_logger()
def _setup_logger(self):
"""设置日志"""
logger = logging.getLogger('CAGSystem')
logger.setLevel(logging.INFO)
if not logger.handlers:
handler = logging.StreamHandler()
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
def add_knowledge(self,
text: str,
doc_id: str,
metadata: Dict = None,
cache_strategy: CacheStrategy = None,
ttl: int = None):
"""添加知识"""
try:
full_metadata = {
'doc_id': doc_id,
'cache_strategy': (cache_strategy or self.config.strategy).value,
'ttl': ttl or self.config.default_ttl,
**(metadata or {})
}
self.vector_db.add_document(text, full_metadata)
self.logger.info(f"Added document: {doc_id}")
except Exception as e:
self.logger.error(f"Error adding document {doc_id}: {str(e)}")
self.metrics['errors'] += 1
raise
def query(self, question: str, force_refresh: bool = False) -> Dict:
"""查询"""
start_time = time.time()
self.metrics['total_queries'] += 1
try:
# 强制刷新则跳过缓存
if not force_refresh:
# 先查静态缓存
cached = self.static_cache.get(question)
if cached:
self.metrics['cache_hits'] += 1
response_time = time.time() - start_time
self.metrics['avg_response_time'].append(response_time)
self.logger.info(f"Cache hit: {question[:30]}...")
return {
'question': question,
'answer': cached['answer'],
'source': 'STATIC_CACHE',
'response_time': response_time,
'cached': True
}
# 再查智能缓存
smart_cached = self.smart_cache.get(question)
if smart_cached and question in self.smart_cache.cache:
self.metrics['cache_hits'] += 1
response_time = time.time() - start_time
self.metrics['avg_response_time'].append(response_time)
self.logger.info(f"Smart cache hit: {question[:30]}...")
return {
'question': question,
'answer': smart_cached['answer'],
'source': 'SMART_CACHE',
'response_time': response_time,
'cached': True
}
# 缓存未命中，执行检索
self.logger.info(f"Cache miss, retrieving: {question[:30]}...")
results = self.vector_db.search(question, top_k=2)
self.metrics['db_queries'] += 1
if not results:
response_time = time.time() - start_time
self.metrics['avg_response_time'].append(response_time)
return {
'question': question,
'answer': '未找到相关信息',
'source': 'NONE',
'response_time': response_time,
'cached': False
}
# 生成答案
context = results[0]['text']
metadata = results[0].get('metadata', {})
answer = f"基于检索: {context[:100]}..."
# 根据策略决定是否缓存
strategy = CacheStrategy(metadata.get('cache_strategy', 'smart'))
ttl = metadata.get('ttl', self.config.default_ttl)
if strategy == CacheStrategy.ALWAYS:
# 直接缓存到静态缓存
self.static_cache.set(question, {
'answer': answer,
'context': context,
'metadata': metadata
}, ttl=ttl)
self.logger.info(f"Cached to static (ALWAYS): {question[:30]}...")
elif strategy == CacheStrategy.SMART:
# 让智能缓存决定
cached = self.smart_cache.set(question, {
'answer': answer,
'context': context,
'metadata': metadata
})
if cached:
self.logger.info(f"Promoted to smart cache: {question[:30]}...")
response_time = time.time() - start_time
self.metrics['avg_response_time'].append(response_time)
return {
'question': question,
'answer': answer,
'source': 'RETRIEVAL',
'response_time': response_time,
'cached': False,
'cache_strategy': strategy.value
}
except Exception as e:
self.logger.error(f"Error processing query '{question}': {str(e)}")
self.metrics['errors'] += 1
raise
def invalidate_cache(self, pattern: str = None, doc_id: str = None):
"""失效缓存"""
try:
if pattern:
self.static_cache.invalidate_by_pattern(pattern)
self.logger.info(f"Invalidated cache by pattern: {pattern}")
if doc_id:
self.static_cache.invalidate_by_pattern(doc_id)
self.logger.info(f"Invalidated cache for doc: {doc_id}")
except Exception as e:
self.logger.error(f"Error invalidating cache: {str(e)}")
raise
def get_health_status(self) -> Dict:
"""获取系统健康状态"""
total_queries = self.metrics['total_queries']
cache_hit_rate = self.metrics['cache_hits'] / total_queries if total_queries > 0 else 0
avg_response = np.mean(self.metrics['avg_response_time']) if self.metrics['avg_response_time'] else 0
# 健康评分
health_score = 100
if cache_hit_rate < 0.3:
health_score -= 20  # 命中率低
if avg_response > 0.1:
health_score -= 15  # 响应慢
if self.metrics['errors'] > 0:
health_score -= 30  # 有错误
status = 'healthy' if health_score >= 80 else 'degraded' if health_score >= 50 else 'unhealthy'
return {
'status': status,
'health_score': health_score,
'metrics': {
'total_queries': total_queries,
'cache_hit_rate': f"{cache_hit_rate*100:.1f}%",
'avg_response_time': f"{avg_response*1000:.2f}ms",
'db_queries': self.metrics['db_queries'],
'errors': self.metrics['errors']
},
'cache_info': {
'static_cache_size': self.static_cache.get_statistics()['cache_size'],
'smart_cache_size': self.smart_cache.get_statistics()['cache_size']
}
}
def export_metrics(self) -> Dict:
"""导出指标（用于监控系统）"""
static_stats = self.static_cache.get_statistics()
smart_stats = self.smart_cache.get_statistics()
return {
'timestamp': datetime.now().isoformat(),
'queries': {
'total': self.metrics['total_queries'],
'cache_hits': self.metrics['cache_hits'],
'db_queries': self.metrics['db_queries'],
'errors': self.metrics['errors']
},
'performance': {
'cache_hit_rate': self.metrics['cache_hits'] / self.metrics['total_queries'] if self.metrics['total_queries'] > 0 else 0,
'avg_response_time': np.mean(self.metrics['avg_response_time']) if self.metrics['avg_response_time'] else 0,
'p95_response_time': np.percentile(self.metrics['avg_response_time'], 95) if len(self.metrics['avg_response_time']) > 0 else 0
},
'cache': {
'static': static_stats,
'smart': smart_stats
}
}
def demo_production_system():
"""演示生产级系统"""
print("=" * 60)
print("生产级CAG系统演示")
print("=" * 60)
# 创建系统（不同配置）
config = CacheConfig(
max_size=50,
default_ttl=3600,
min_access_count=2,
strategy=CacheStrategy.SMART,
enable_metrics=True
)
system = ProductionCAGSystem(config)
print(f"\n系统配置:")
print(f"  缓存大小: {config.max_size}")
print(f"  默认TTL: {config.default_ttl}秒")
print(f"  最小访问次数: {config.min_access_count}")
print(f"  缓存策略: {config.strategy.value}")
# 添加不同类型的知识
print("\n" + "=" * 60)
print("添加知识")
print("=" * 60)
# 1. 静态知识（总是缓存）
system.add_knowledge(
text="公司年假政策：入职满1年5天，满3年10天，满5年15天",
doc_id="policy_001",
metadata={'category': '制度', 'type': 'static'},
cache_strategy=CacheStrategy.ALWAYS,
ttl=86400  # 24小时
)
print("✅ 添加静态知识: 年假政策（ALWAYS缓存）")
# 2. 半静态知识（智能缓存）
system.add_knowledge(
text="产品价格表：基础版99元/月，专业版199元/月，企业版499元/月",
doc_id="product_002",
metadata={'category': '产品', 'type': 'semi-static'},
cache_strategy=CacheStrategy.SMART,
ttl=3600  # 1小时
)
print("✅ 添加半静态知识: 产品价格（SMART缓存）")
# 3. 动态知识（不缓存）
system.add_knowledge(
text="今日促销：所有产品8折优惠，仅限今天！",
doc_id="promo_003",
metadata={'category': '促销', 'type': 'dynamic'},
cache_strategy=CacheStrategy.NEVER,
ttl=300  # 5分钟
)
print("✅ 添加动态知识: 促销信息（NEVER缓存）")
# 模拟真实查询场景
print("\n" + "=" * 60)
print("模拟真实查询")
print("=" * 60)
queries = [
# 静态问题（高频）
("年假怎么算", 5),
("年假政策", 3),
# 半静态问题（中频）
("产品价格", 3),
("多少钱", 2),
# 动态问题（低频）
("今天有优惠吗", 2),
("促销活动", 1)
]
print("\n执行查询...")
for question, count in queries:
for i in range(count):
result = system.query(question)
if i == 0:  # 只显示首次查询
cache_tag = "⚡" if result['cached'] else "🔍"
print(f"  {cache_tag} {question}: {result['source']} ({result['response_time']*1000:.2f}ms)")
# 显示健康状态
print("\n" + "=" * 60)
print("系统健康状态")
print("=" * 60)
health = system.get_health_status()
status_icon = "✅" if health['status'] == 'healthy' else "⚠️" if health['status'] == 'degraded' else "❌"
print(f"\n状态: {status_icon} {health['status'].upper()}")
print(f"健康评分: {health['health_score']}/100")
print(f"\n指标:")
for key, value in health['metrics'].items():
print(f"  {key}: {value}")
print(f"\n缓存信息:")
for key, value in health['cache_info'].items():
print(f"  {key}: {value}")
# 导出指标
print("\n" + "=" * 60)
print("性能指标（可接入Prometheus/Grafana）")
print("=" * 60)
metrics = system.export_metrics()
print(f"\n时间戳: {metrics['timestamp']}")
print(f"\n查询统计:")
print(f"  总查询: {metrics['queries']['total']}")
print(f"  缓存命中: {metrics['queries']['cache_hits']}")
print(f"  数据库查询: {metrics['queries']['db_queries']}")
print(f"  错误数: {metrics['queries']['errors']}")
print(f"\n性能指标:")
print(f"  缓存命中率: {metrics['performance']['cache_hit_rate']*100:.1f}%")
print(f"  平均响应时间: {metrics['performance']['avg_response_time']*1000:.2f}ms")
print(f"  P95响应时间: {metrics['performance']['p95_response_time']*1000:.2f}ms")
# 最佳实践总结
print("\n" + "=" * 60)
print("生产环境最佳实践")
print("=" * 60)
print("""
1. 【分层缓存策略】
- 静态知识：ALWAYS + 长TTL（24小时）
- 半静态知识：SMART + 中TTL（1小时）
- 动态知识：NEVER 或 短TTL（5分钟）
2. 【监控指标】
- 缓存命中率：目标 >50%
- 平均响应时间：目标 <50ms
- P95响应时间：目标 <100ms
- 错误率：目标 <0.1%
3. 【容量规划】
- 缓存大小 = 日查询量 × 0.2（二八定律）
- 预留20%扩展空间
- 设置告警阈值：命中率<30%、响应>100ms
4. 【失效策略】
- 定时失效：使用TTL
- 主动失效：文档更新时触发
- 批量失效：支持按模式匹配
5. 【高可用保障】
- 缓存失败降级到检索
- 异常捕获和日志记录
- 健康检查接口
- 指标导出到监控系统
""")
# 运行生产系统演示
demo_production_system()

6.2完整的代码示例

现在让我们把所有代码整合到一起，提供一个完整可运行的demo：

def run_complete_demo():
"""运行完整演示"""
print("\n\n")
print("="* 80)
print(" " * 20 + "CAG完整演示：从RAG到生产级CAG")
print("=" * 80)
print("\n这个演示将展示：")
print("  1. 传统RAG的性能问题")
print("  2. CAG如何解决这些问题")
print("  3. RAG+CAG混合架构")
print("  4. 智能缓存策略")
print("  5. 缓存更新机制")
print("  6. 生产级系统实现")
print("\n" + "=" * 80)
input("按回车键开始演示...")
# 依次运行各个演示
demos = [
("传统RAG系统", demo_traditional_rag),
("CAG系统", demo_cag_system),
("RAG vs CAG性能对比", compare_rag_vs_cag),
("混合RAG+CAG系统", demo_hybrid_system),
("智能缓存", demo_smart_caching),
("缓存更新机制", demo_cache_update),
("缓存失效策略对比", compare_invalidation_strategies),
("生产级系统", demo_production_system),
]
for i, (name, demo_func) in enumerate(demos, 1):
print(f"\n\n{'='*80}")
print(f"演示 {i}/{len(demos)}: {name}")
print("="*80)
input("按回车继续...")
demo_func()
print("\n演示完成!")
if i < len(demos):
input("按回车进入下一个演示...")
print("\n\n" + "="*80)
print(" " * 30 + "所有演示完成！")
print("="*80)
print("\n📚 你已经学会了：")
print("  ✅ RAG的基本原理和问题")
print("  ✅ CAG如何通过缓存提升性能")
print("  ✅ 如何设计混合架构")
print("  ✅ 智能缓存策略的实现")
print("  ✅ 缓存更新和失效机制")
print("  ✅ 生产级系统的完整实现")
print("\n💡 下一步：")
print("  - 在自己的项目中应用这些技术")
print("  - 根据实际场景调整缓存策略")
print("  - 接入监控系统持续优化")
print("  - 考虑分布式缓存（Redis等）")
# 如果直接运行此文件，执行完整演示
if __name__ == "__main__":
run_complete_demo()