但在实际生产环境中,你仍会面临一个痛苦的现实:

  • 向量检索 找到的是 语义相似 的内容,不一定是 正确 的内容。
  • GraphRAG 提升了 推理能力,但可能遗漏 精确事实(如ID、代码、条款)。
  • 关键词检索(BM25)能精准命中 完全匹配 的内容,却无法处理 同义转述

因此,真正“生产级”的 RAG 应该是这样的:

三模态混合RAG = BM25(精确性)+ 向量(语义)+ GraphRAG(关系)+ 重排序(精度)

下面是一个 完整可运行的实现,你可以直接复制粘贴并运行。

我们将构建什么

一个完整的流水线:

  1. 文档分块与索引
  • BM25 索引

    (词汇检索)

  • 向量索引

    (语义检索)

  • GraphRAG 索引

    (实体图 + 块链接)

  • 混合融合

    (分数归一化 + 合并)

  • 交叉编码器重排序

    (最终精度)

  • 上下文构建器

    (LLM就绪的上下文)

安装依赖

pip install rank-bm25 sentence-transformers numpy networkx

完整代码:三模态混合RAG(BM25 + 向量 + GraphRAG + 重排序)

此代码可在本地运行,无需外部数据库。
此处的 GraphRAG 实现为轻量级的 实体图(共现图),用于扩展相关实体并检索关联的文本块。

from __future__ import annotationsimport refrom dataclasses import dataclassfrom typing import Dict, List, Tuple, Iterable, Optionalimport numpy as npimport networkx as nxfrom rank_bm25 import BM25Okapifrom sentence_transformers import SentenceTransformer, CrossEncoder# ===========================================================================# 数据结构定义# ===========================================================================@dataclass(frozen=True)class DocChunk:    chunk_id: str    text: str    meta: Dict[str, str]@dataclass(frozen=True)class Hit:    chunk: DocChunk    score: float    source: str  # "bm25" | "vector" | "graph" | "hybrid" | "rerank"# ===========================================================================# 文本工具函数(快速可靠)# ===========================================================================_WS = re.compile(r"\s+")_WORDS = re.compile(r"[a-z0-9]+")def normalize_text(s: str) -> str:    return _WS.sub(" ", s.strip())def bm25_tokenize(s: str) -> List[str]:    # BM25 喜欢小写分词    return _WORDS.findall(s.lower())def minmax_norm(scores: List[float]) -> List[float]:    if not scores:        return scores    mn, mx = min(scores), max(scores)    if abs(mx - mn) < 1e-12:        return [1.0] * len(scores)    return [(s - mn) / (mx - mn) for s in scores]def cosine_sim(q: np.ndarray, X: np.ndarray) -> np.ndarray:    # q: (d,), X: (n,d) -> (n,)    qn = q / (np.linalg.norm(q) + 1e-12)    Xn = X / (np.linalg.norm(X, axis=1, keepdims=True) + 1e-12)    return Xn @ qn# ===========================================================================# GraphRAG(轻量级):实体图 + 文本块链接# ===========================================================================class EntityGraphIndex:    """    轻量级 GraphRAG 风格索引:    - 从每个文本块中提取实体(简单启发式规则)    - 构建无向共现图    - 映射 实体 -> 文本块ID    - 查询扩展:从查询中提取实体,扩展邻居节点,获取关联文本块    """    def __init__(self) -> None:        self.g = nx.Graph()        self.entity_to_chunks: Dict[str, set[str]] = {}        self.chunk_entities: Dict[str, List[str]] = {}    @staticmethod    def extract_entities(text: str) -> List[str]:        """        简单的实体提取启发式规则:        - 保留首字母大写的词(如 "NeonDB", "SuccessFactors")        - 保留全大写词(如 "API", "HTTP")        - 保留错误代码类词(如 "E1127", "ERR_401", "0xA00F4244")        - 保留带点标识符(如 "apps.pricing.services")        """        # 首字母大写或驼峰式单词        titleish = re.findall(r"\b[A-Z][a-z]+(?:[A-Z][a-z]+)*\b", text)        # 全大写缩写词        acronyms = re.findall(r"\b[A-Z]{2,}\b", text)        # 错误码/十六进制/混合格式        codes = re.findall(r"\b(?:0x[a-fA-F0-9]{6,}|[A-Z]{1,5}[_-]?\d{2,}|E\d{3,6})\b", text)        # 带点路径(对开发者文档有用)        dotted = re.findall(r"\b[a-zA-Z_]\w*(?:\.[a-zA-Z_]\w*){2,}\b", text)        # 标准化:转为小写以保证节点ID稳定        raw = titleish + acronyms + codes + dotted        cleaned = []        for e in raw:            e = e.strip()            if len(e) < 3:                continue            cleaned.append(e.lower())        # 去重但保持顺序        seen = set()        out = []        for e in cleaned:            if e not in seen:                seen.add(e)                out.append(e)        return out    def build(self, chunks: List[DocChunk]) -> None:        for c in chunks:            ents = self.extract_entities(c.text)            self.chunk_entities[c.chunk_id] = ents            # 实体 -> 文本块ID 映射            for e in ents:                self.entity_to_chunks.setdefault(e, set()).add(c.chunk_id)            # 构建共现边            for i in range(len(ents)):                for j in range(i + 1, len(ents)):                    a, b = ents[i], ents[j]                    if a == b:                        continue                    if self.g.has_edge(a, b):                        self.g[a][b]["w"] += 1                    else:                        self.g.add_edge(a, b, w=1)    def expand_query_entities(self, query: str, depth: int = 1, max_nodes: int = 30) -> List[str]:        seeds = self.extract_entities(query)        if not seeds:            return []        visited = set(seeds)        frontier = list(seeds)        for _ in range(depth):            nxt = []            for node in frontier:                if node not in self.g:                    continue                # 按边权重降序排列邻居                neigh = sorted(                    self.g.neighbors(node),                    key=lambda x: self.g[node][x].get("w", 1),                    reverse=True                )                for nb in neigh[:10]:                    if nb not in visited:                        visited.add(nb)                        nxt.append(nb)                        if len(visited) >= max_nodes:                            return list(visited)            frontier = nxt        return list(visited)    def graph_retrieve_chunk_ids(self, query: str, depth: int = 1, per_entity_limit: int = 6) -> Dict[str, float]:        """        返回 文本块ID -> 分数,采用简单打分策略:        - 被种子实体匹配的文本块得分更高        - 被扩展实体匹配的文本块得分较低        """        expanded = self.expand_query_entities(query, depth=depth)        if not expanded:            return {}        seeds = set(self.extract_entities(query))        scores: Dict[str, float] = {}        for e in expanded:            chunk_ids = list(self.entity_to_chunks.get(e, []))            if not chunk_ids:                continue            # 种子实体权重更高            w = 1.0 if e in seeds else 0.55            # 限制每个实体返回数量,避免图泛洪            for cid in chunk_ids[:per_entity_limit]:                scores[cid] = scores.get(cid, 0.0) + w        return scores# ===========================================================================# 三模态混合RAG引擎# ===========================================================================class TriModalHybridRAG:    """    生产级模式:    - BM25(精确关键词)    - 向量检索(语义相似性)    - GraphRAG(实体图扩展)    - 归一化融合 + 权重    - 交叉编码器重排序(最终精度)    """    def __init__(        self,        embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",        rerank_model: str = "cross-encoder/ms-marco-MiniLM-L-6-v2",    ) -> None:        self.embedder = SentenceTransformer(embedding_model)        self.reranker = CrossEncoder(rerank_model)        self.chunks: List[DocChunk] = []        self._bm25: Optional[BM25Okapi] = None        self._bm25_tokens: List[List[str]] = []        self._embeddings: Optional[np.ndarray] = None  # (n, d)        self.graph_index = EntityGraphIndex()        self._chunk_by_id: Dict[str, DocChunk] = {}    # -------------------------------------------------------------------------    # 索引构建    # -------------------------------------------------------------------------    def index(self, chunks: List[DocChunk]) -> None:        self.chunks = [            DocChunk(c.chunk_id, normalize_text(c.text), c.meta)            for c in chunks        ]        self._chunk_by_id = {c.chunk_id: c for c in self.chunks}        # BM25        self._bm25_tokens = [bm25_tokenize(c.text) for c in self.chunks]        self._bm25 = BM25Okapi(self._bm25_tokens)        # 向量        texts = [c.text for c in self.chunks]        self._embeddings = np.array(self.embedder.encode(texts, normalize_embeddings=True))        # GraphRAG        self.graph_index.build(self.chunks)    # -------------------------------------------------------------------------    # BM25 检索    # -------------------------------------------------------------------------    def bm25_search(self, query: str, top_k: int = 12) -> List[Hit]:        if not self._bm25:            raise RuntimeError("请先调用 index() 方法。")        q = bm25_tokenize(query)        scores = self._bm25.get_scores(q)        idxs = np.argsort(scores)[::-1][:top_k]        return [Hit(self.chunks[int(i)], float(scores[int(i)]), "bm25") for i in idxs]    # -------------------------------------------------------------------------    # 向量检索    # -------------------------------------------------------------------------    def vector_search(self, query: str, top_k: int = 12) -> List[Hit]:        if self._embeddings is None:            raise RuntimeError("请先调用 index() 方法。")        q = np.array(self.embedder.encode([query], normalize_embeddings=True))[0]        sims = cosine_sim(q, self._embeddings)        idxs = np.argsort(sims)[::-1][:top_k]        return [Hit(self.chunks[int(i)], float(sims[int(i)]), "vector") for i in idxs]    # -------------------------------------------------------------------------    # GraphRAG 检索(实体图)    # -------------------------------------------------------------------------    def graph_search(self, query: str, depth: int = 1, top_k: int = 12) -> List[Hit]:        cid_to_score = self.graph_index.graph_retrieve_chunk_ids(query, depth=depth)        if not cid_to_score:            return []        # 按图分数降序排序        items = sorted(cid_to_score.items(), key=lambda x: x[1], reverse=True)[:top_k]        hits: List[Hit] = []        for cid, sc in items:            c = self._chunk_by_id.get(cid)            if c:                hits.append(Hit(c, float(sc), "graph"))        return hits    # -------------------------------------------------------------------------    # 三模态混合融合    # -------------------------------------------------------------------------    def hybrid_candidates(        self,        query: str,        bm25_top: int = 12,        vec_top: int = 12,        graph_top: int = 12,        graph_depth: int = 1,        merged_top: int = 18,        w_bm25: float = 0.34,        w_vec: float = 0.44,        w_graph: float = 0.22,    ) -> List[Hit]:        bm25_hits = self.bm25_search(query, top_k=bm25_top)        vec_hits = self.vector_search(query, top_k=vec_top)        graph_hits = self.graph_search(query, depth=graph_depth, top_k=graph_top)        # 各通道分数归一化        bm25_norm = minmax_norm([h.score for h in bm25_hits])        vec_norm = minmax_norm([h.score for h in vec_hits])        graph_norm = minmax_norm([h.score for h in graph_hits])        merged: Dict[str, Tuple[DocChunk, float]] = {}        def add(hits: List[Hit], norms: List[float], weight: float) -> None:            for h, ns in zip(hits, norms):                cid = h.chunk.chunk_id                prev = merged.get(cid, (h.chunk, 0.0))[1]                merged[cid] = (h.chunk, prev + ns * weight)        add(bm25_hits, bm25_norm, w_bm25)        add(vec_hits, vec_norm, w_vec)        add(graph_hits, graph_norm, w_graph)        out = [Hit(chunk=c, score=s, source="hybrid") for (c, s) in merged.values()]        out.sort(key=lambda x: x.score, reverse=True)        return out[:merged_top]    # -------------------------------------------------------------------------    # 重排序(交叉编码器)    # -------------------------------------------------------------------------    def rerank(self, query: str, candidates: List[Hit], top_k: int = 6) -> List[Hit]:        pairs = [(query, h.chunk.text) for h in candidates]        scores = self.reranker.predict(pairs)        reranked = [Hit(h.chunk, float(s), "rerank") for h, s in zip(candidates, scores)]        reranked.sort(key=lambda x: x.score, reverse=True)        return reranked[:top_k]    # -------------------------------------------------------------------------    # 构建LLM上下文    # -------------------------------------------------------------------------    def build_context(self, hits: List[Hit], max_chars: int = 5000) -> str:        parts: List[str] = []        used = 0        for h in hits:            src = h.chunk.meta.get("source", "doc")            header = f"[{src} | {h.chunk.chunk_id}]"            block = f"{header}\n{h.chunk.text}\n"            if used + len(block) > max_chars:                break            parts.append(block)            used += len(block)        return "\n---\n".join(parts)    # -------------------------------------------------------------------------    # 端到端检索    # -------------------------------------------------------------------------    def retrieve(        self,        query: str,        bm25_top: int = 12,        vec_top: int = 12,        graph_top: int = 12,        graph_depth: int = 1,        merged_top: int = 18,        rerank_top: int = 6,    ) -> List[Hit]:        candidates = self.hybrid_candidates(            query=query,            bm25_top=bm25_top,            vec_top=vec_top,            graph_top=graph_top,            graph_depth=graph_depth,            merged_top=merged_top,        )        return self.rerank(query, candidates, top_k=rerank_top)# ===========================================================================# 演示:在此处插入你的文本块# =========================================================================def demo_chunks() -> List[DocChunk]:    """    请替换为你的实际分块输出。    本演示特意混合了以下内容:    - ID 和代码(BM25 优势)    - 同义转述(向量优势)    - 实体关系(GraphRAG 优势)    """    return [        DocChunk(            "c1",            "错误 E1127 发生在访问令牌过期时。请刷新令牌后重试请求。",            {"source": "操作手册"}        ),        DocChunk(            "c2",            "如果认证失败并返回 HTTP 401,请轮换 API 密钥并重新生成客户端密钥。",            {"source": "操作手册"}        ),        DocChunk(            "c3",            "退款政策:年付套餐在购买后14天内可申请退款。月付套餐不可退款。",            {"source": "政策"}        ),        DocChunk(            "c4",            "取消订阅需至少在续订前7天联系计费部门,以避免产生费用。",            {"source": "政策"}        ),        DocChunk(            "c5",            "计费服务依赖于认证服务。认证服务会验证由 IdentityProvider 颁发的令牌。",            {"source": "架构文档"}        ),        DocChunk(            "c6",            "IdentityProvider 颁发 JWT 令牌。认证服务在允许访问前会验证 JWT 签名和声明。",            {"source": "架构文档"}        ),        DocChunk(            "c7",            "如果用户无法登录,请先检查 IdentityProvider 的健康状态,再查看认证日志中的 JWT 失败记录。",            {"source": "支持指南"}        ),    ]if __name__  "__main__":    rag = TriModalHybridRAG()    rag.index(demo_chunks())    query = "为什么我看到 E1127 错误?Auth 服务和 IdentityProvider 之间有什么关系?"    hits = rag.retrieve(query, graph_depth=2)    print("\n重排序后的最佳结果:\n")    for h in hits:        print(f"- {h.score:.4f} | {h.chunk.chunk_id} | {h.chunk.meta.get('source')}")    print("\n\nLLM 上下文:\n")    print(rag.build_context(hits))

为什么这个实现真正有效?

✅ BM25 捕捉“关键术语”

ID、错误码、政策措辞、条款表述。

✅ 向量检索捕捉意图

同义转述、近义词、概念相似性。

✅ GraphRAG 捕捉关系

“为什么”、“如何”、依赖关系、多跳推理。

✅ 重排序将召回率转化为精度

没有重排序,混合检索常返回“接近但不准确”的结果。

GraphRAG + 混合RAG 流程图

┌───────────────┐                 │   用户查询     │                 └───────┬───────┘                         │        ┌────────────────┼────────────────┐        │                │                │   ┌────▼────┐      ┌────▼────┐      ┌────▼──────┐   │  BM25   │      │ 向量检索 │      │  GraphRAG  │   │关键词匹配│      │语义相似 │      │实体+知识图谱│   └────┬────┘      └────┬────┘      └────┬──────┘        │                │                │        └────────────────┼────────────────┘                         │                 ┌───────▼───────┐                 │ 融合 + 归一化  │                 └───────┬───────┘                         │                 ┌───────▼───────┐                 │   重排序器     │                 └───────┬───────┘                         │                 ┌───────▼───────┐                 │ LLM + 生成答案 │                 └───────────────┘

#AI #AI工具 #软件开发 #LLM #大模型 #agent #Agent #智能体


学AI大模型的正确顺序,千万不要搞错了

🤔2026年AI风口已来!各行各业的AI渗透肉眼可见,超多公司要么转型做AI相关产品,要么高薪挖AI技术人才,机遇直接摆在眼前!

有往AI方向发展,或者本身有后端编程基础的朋友,直接冲AI大模型应用开发转岗超合适!

就算暂时不打算转岗,了解大模型、RAG、Prompt、Agent这些热门概念,能上手做简单项目,也绝对是求职加分王🔋

在这里插入图片描述

📝给大家整理了超全最新的AI大模型应用开发学习清单和资料,手把手帮你快速入门!👇👇

学习路线:

✅大模型基础认知—大模型核心原理、发展历程、主流模型(GPT、文心一言等)特点解析
✅核心技术模块—RAG检索增强生成、Prompt工程实战、Agent智能体开发逻辑
✅开发基础能力—Python进阶、API接口调用、大模型开发框架(LangChain等)实操
✅应用场景开发—智能问答系统、企业知识库、AIGC内容生成工具、行业定制化大模型应用
✅项目落地流程—需求拆解、技术选型、模型调优、测试上线、运维迭代
✅面试求职冲刺—岗位JD解析、简历AI项目包装、高频面试题汇总、模拟面经

以上6大模块,看似清晰好上手,实则每个部分都有扎实的核心内容需要吃透!

我把大模型的学习全流程已经整理📚好了!抓住AI时代风口,轻松解锁职业新可能,希望大家都能把握机遇,实现薪资/职业跃迁~

这份完整版的大模型 AI 学习资料已经上传CSDN,朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费

在这里插入图片描述

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐