GraphRAG实战终极指南!从0到1打通OpenAI+LangChain+Neo4j,收藏这篇就够了!
本文介绍了基于图结构的检索增强生成(GraphRAG)的实现流程,结合OpenAI和Neo4j技术。首先将文本转换为图结构,使用OpenAI API识别实体和关系;然后将生成的图数据存储到Neo4j图数据库中;最后通过提取用户问题中的实体进行图查询,结合大语言模型生成回答。以居里夫人传记文本为例,展示了如何将文本转换为包含人物、奖项、机构等实体及其关系的图结构,并存储到Neo4j中。该方法利用图数
本文我们将讨论GraphRAG(Graph-based Retrieval Augmented Generation)的实现流程,其中使用OpenAI进行自然语言处理,使用neo4j作为图数据库。在这个流程中,我们将展示:
- 首先将文本转换为图结构
- 然后将图结构存储在neo4j中
- 最后提取用户问题中的实体,使用提取到的实体检索相关的实体和他们的关系,再借助llm生成回答
Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.She was, in 1906, the first woman to become a professor at the University of Paris.
将上述文本使用OpenAI将文本转换为图表示,并存储在neo4j中

在上图中,紫色的节点(df48cdaf)代表文档,红色节点(Nobel Prize)诺贝尔奖,两个蓝色的节点代表人名(玛丽·居里、皮埃尔·居里),灰色的代表(University Of Paris)巴黎大学。其中文档和其他所有节点的关系是提及(mentions)。
一、GraphRAG实现
为了快速了解GraphRAG背后的逻辑,可以使用OpenAi api、neo4j sandbox在快速开始实验
安装及导入包
!pip install langchain!pip install -U langchain-community!pip install sentence-transformers!pip install faiss-gpu!pip install pypdf!pip install faiss-cpu!pip install langchain-openai!pip install langchain-experimental!pip install json-repair!pip install neo4jfrom langchain_openai importChatOpenAIfrom langchain.chainsimportRetrievalQAfrom langchain.document_loadersimportPyPDFLoaderfrom langchain.text_splitterimportCharacterTextSplitterfrom langchain.embeddingsimportHuggingFaceEmbeddingsfrom langchain.vectorstoresimportFAISSfrom langchain_core.documentsimportDocumentfrom langchain_openai importOpenAIEmbeddingsfrom langchain_community.graphsimportNeo4jGraphfrom langchain_experimental.graph_transformersimportLLMGraphTransformerfrom langchain_community.chat_modelsimportChatOllamafrom langchain_community.vectorstoresimportNeo4jVectorfrom langchain_core.promptsimportChatPromptTemplatefrom pydantic importBaseModel, Fieldfrom langchain_core.runnablesimportRunnablePassthroughfrom langchain_core.output_parsersimportStrOutputParser
配置neo4j的连接
graph = Neo4jGraph( url= "bolt://44.204.252.192" , username="neo4j", #default password="quarterdeck-gross-dials" #change accordingly)
使用OpenAI将文本转换为Graph
将文本转换为document
text = """Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.She was, in 1906, the first woman to become a professor at the University of Paris."""documents = [Document(page_content=text)]
加载大模型,将文本转换为graph
llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo",api_key="sk-FgKk2OO5RYzYRJEf7eaMytOLsuIbZecGxaJvRnWDg1GCIkNh")llm_transformer_filtered = LLMGraphTransformer(llm=llm)graph_documents = llm_transformer_filtered.convert_to_graph_documents(documents)
graph_documents的内容如下
[GraphDocument(nodes=[Node(id='Marie Curie', type='Person', properties={}), Node(id='Pierre Curie', type='Person', properties={}), Node(id='University Of Paris', type='Organization', properties={}), Node(id='Nobel Prize', type='Award', properties={})], relationships=[Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='Nobel Prize', type='Award', properties={}), type='WINNER', properties={}), Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='University Of Paris', type='Organization', properties={}), type='PROFESSOR', properties={}), Relationship(source=Node(id='Pierre Curie', type='Person', properties={}), target=Node(id='Nobel Prize', type='Award', properties={}), type='CO-WINNER', properties={})], source=Document(metadata={}, page_content='\nMarie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.\nShe was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.\nHer husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.\nShe was, in 1906, the first woman to become a professor at the University of Paris.\n'))]
将生成的graph存储在neo4j
graph.add_graph_documents( graph_documents, baseEntityLabel=True, include_source=True )
为了复杂查询在neo4j中创建embedding
embed = OpenAIEmbeddings(model="text-embedding-3-large",base_url="https://xiaoai.plus/v1",api_key="sk-FgKk2OO5RYzYRJEf7eaMytOLsuIbZecGxaJvRnWDg1GCIkNh")vector_index = Neo4jVector.from_existing_graph( embedding=embed, search_type="hybrid", node_label="Document", text_node_properties=["text"], embedding_node_property="embedding", url="bolt://44.204.252.192", username="neo4j", #default password="quarterdeck-gross-dials" #change accordingly)vector_retriever = vector_index.as_retriever()
此时在neo4j中可以看到如下数据
{ "identity": 0,"labels": [ "Document" ],"properties": { "id": "df48cdafbdaada2de04aaeb7c6a271a0", "text": "Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.....", "embedding": [ 0.013757660053670406, -0.035230763256549835, -0.014454838819801807, ... ] },"elementId": "4:56545626-8926-4df0-bdb3-73bbd4de10d6:0"}{"identity": 1,"labels": [ "Person", "__Entity__" ],"properties": { "id": "Marie Curie" },"elementId": "4:56545626-8926-4df0-bdb3-73bbd4de10d6:1"}
在neo4j中查询实体
一旦我们将graph存储在了neo4j中,我们可以提取用户问题中的实体, 并在graph中查找相关的实体及其关系
定义从文本中提取实体的模型
class Entities(BaseModel): names: list[str] = Field(..., description="All entities from the text")
定义提取实体的提示词
prompt = ChatPromptTemplate.from_messages([ ("system", "Extract organization and person entities from the text."), ("human", "Extract entities from: {question}") ])
结合提示词和llm创建提取实体的链,输出结果将是一个结构化的匹配实体的模型
entity_chain = prompt | llm.with_structured_output(Entities, include_raw=True)response = entity_chain.invoke({"question": "Who are Marie Curie and Pierre Curie?"})entities = response['raw'].tool_calls[0]['args']['names']
response 内容如下
{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'chatcmpl-WKYa1IBDY3cBqBgp8JbP6KtvlHniV', 'function': {'arguments': '{"names":["Marie Curie","Pierre Curie"]}', 'name': 'Entities'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 72, 'total_tokens': 85, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-4-turbo', 'system_fingerprint': 'fp_5b26d85e12', 'finish_reason': 'stop', 'logprobs': None}, id='run-41e2ac51-9573-4366-bc80-3080bd464fa6-0', tool_calls=[{'name': 'Entities', 'args': {'names': ['Marie Curie', 'Pierre Curie']}, 'id': 'chatcmpl-WKYa1IBDY3cBqBgp8JbP6KtvlHniV', 'type': 'tool_call'}], usage_metadata={'input_tokens': 72, 'output_tokens': 13, 'total_tokens': 85, 'input_token_details': {}, 'output_token_details': {}}), 'parsed': Entities(names=['Marie Curie', 'Pierre Curie']), 'parsing_error': None}
迭代提取的实体,在neo4j数据库中查询其关联实体及关系
graph_data = ""for entity in entities: query_response = graph.query( """MATCH (p:Person {id: $entity})-[r]->(e) RETURN p.id AS source_id, type(r) AS relationship, e.id AS target_id LIMIT 50""", {"entity": entity} ) graph_data += "\n".join([f"{el['source_id']} - {el['relationship']} -> {el['target_id']}" for el in query_response])graph_data
graph_data 内容如下
Marie Curie - WINNER -> Nobel PrizeMarie Curie - PROFESSOR -> University Of ParisPierre Curie - CO-WINNER -> Nobel Prize
使用向量搜索
vector_data = [el.page_content for el in vector_retriever.invoke( "Who are Marie Curie and Pierre Curie?")]vector_data
vector_data内容如下
['\ntext: \nMarie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.\nShe was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.\nHer husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.\nShe was, in 1906, the first woman to become a professor at the University of Paris.\n']
结合图搜索和向量搜索结果生成回答
context= f"Graph data: {graph_data}\nVector data: {'#Document '.join(vector_data)}"
定义提示词模板,为了基于上下文生成回答
template = """Answer the question based only on the following context:{context}Question: {question}Answer:"""
使用模板创建提示词,这将采用上下文和提问作为输入
prompt = ChatPromptTemplate.from_template(template)
创建处理链:
- 使用上述生成的结果作为上下文输入
- 应用提示词模板生成最终问题
- 使用llm生成回答
- 使用StrOutputParser格式化输出为字符串
chain = ( { "context": lambda input: context, # Generate context from the question "question": RunnablePassthrough(), # Pass the question through without modification } | prompt # Apply the prompt template | llm # Use the language model to answer the question based on context | StrOutputParser() # Parse the model's response as a string )
当输入问题Who are Marie Curie and Pierre Curie?最终结果如下
Marie Curie was a Polish and naturalised-French physicist and chemist known for her research on radioactivity. She was the first woman to win a Nobel Prize, the first person to win it twice, and the only person to win in two scientific fields. She also became the first woman professor at the University of Paris. Pierre Curie, her husband, was a co-winner of her first Nobel Prize. Together, they were the first married couple to win the Nobel Prize.
二、结果比较
不使用graphRGA的输出结果
llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo",base_url="https://xiaoai.plus/v1",api_key="sk-FgKk2OO5RYzYRJEf7eaMytOLsuIbZecGxaJvRnWDg1GCIkNh")response = llm.invoke("Who are Marie Curie and Pierre Curie?")print(response)
``````plaintext
Marie Curie and Pierre Curie were a married couple who were both pioneering scientists in the field of radioactivity. Marie Curie, originally from Poland, was the first woman to win a Nobel Prize and the only person to win Nobel Prizes in two different scientific fields, physics and chemistry. Pierre Curie was a French physicist who made significant contributions to the study of crystallography, magnetism, and radioactivity. Together, they discovered the elements polonium and radium, and conducted groundbreaking research on the properties of radioactive materials. Their work laid the foundation for the development of nuclear physics and the use of radiation in medicine.
使用基于向量的RAG的输出结果
text = """Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.She was, in 1906, the first woman to become a professor at the University of Paris."""docs = [Document(page_content=text)]embeddings = OpenAIEmbeddings(model="text-embedding-3-large"api_key="sk-FgKk2OO5RYzYRJEf7eaMytOLsuIbZecGxaJvRnWDg1GCIkNh")# Create FAISS vector storevectorstore = FAISS.from_documents(docs, embeddings)# Save and reload the vector storevectorstore.save_local("faiss_index_")persisted_vectorstore = FAISS.load_local("faiss_index_", embeddings, allow_dangerous_deserialization=True)# Create a retrieverretriever = persisted_vectorstore.as_retriever()result = qa.invoke("Who are Marie Curie and Pierre Curie?")print(result)
``````plaintext
Marie Curie was a Polish and naturalised-French physicist and chemist known for her research on radioactivity. She was the first woman to win a Nobel Prize, the first person to win twice, and the only person to win in two different scientific fields. Pierre Curie was her husband and a co-winner of her first Nobel Prize. Together, they conducted significant research in the field of radioactivity, and their collaboration marked them as the first-ever married couple to win the Nobel Prize.
如何学习大模型 AI ?
由于新岗位的生产效率,要优于被取代岗位的生产效率,所以实际上整个社会的生产效率是提升的。
但是具体到个人,只能说是:
“最先掌握AI的人,将会比较晚掌握AI的人有竞争优势”。
这句话,放在计算机、互联网、移动互联网的开局时期,都是一样的道理。
我在一线互联网企业工作十余年里,指导过不少同行后辈。帮助很多人得到了学习和成长。
我意识到有很多经验和知识值得分享给大家,也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑,所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限,很多互联网行业朋友无法获得正确的资料得到学习提升,故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

第一阶段(10天):初阶应用
该阶段让大家对大模型 AI有一个最前沿的认识,对大模型 AI 的理解超过 95% 的人,可以在相关讨论时发表高级、不跟风、又接地气的见解,别人只会和 AI 聊天,而你能调教 AI,并能用代码将大模型和业务衔接。
- 大模型 AI 能干什么?
- 大模型是怎样获得「智能」的?
- 用好 AI 的核心心法
- 大模型应用业务架构
- 大模型应用技术架构
- 代码示例:向 GPT-3.5 灌入新知识
- 提示工程的意义和核心思想
- Prompt 典型构成
- 指令调优方法论
- 思维链和思维树
- Prompt 攻击和防范
- …
第二阶段(30天):高阶应用
该阶段我们正式进入大模型 AI 进阶实战学习,学会构造私有知识库,扩展 AI 的能力。快速开发一个完整的基于 agent 对话机器人。掌握功能最强的大模型开发框架,抓住最新的技术进展,适合 Python 和 JavaScript 程序员。
- 为什么要做 RAG
- 搭建一个简单的 ChatPDF
- 检索的基础概念
- 什么是向量表示(Embeddings)
- 向量数据库与向量检索
- 基于向量检索的 RAG
- 搭建 RAG 系统的扩展知识
- 混合检索与 RAG-Fusion 简介
- 向量模型本地部署
- …
第三阶段(30天):模型训练
恭喜你,如果学到这里,你基本可以找到一份大模型 AI相关的工作,自己也能训练 GPT 了!通过微调,训练自己的垂直大模型,能独立训练开源多模态大模型,掌握更多技术方案。
到此为止,大概2个月的时间。你已经成为了一名“AI小子”。那么你还想往下探索吗?
- 为什么要做 RAG
- 什么是模型
- 什么是模型训练
- 求解器 & 损失函数简介
- 小实验2:手写一个简单的神经网络并训练它
- 什么是训练/预训练/微调/轻量化微调
- Transformer结构简介
- 轻量化微调
- 实验数据集的构建
- …
第四阶段(20天):商业闭环
对全球大模型从性能、吞吐量、成本等方面有一定的认知,可以在云端和本地等多种环境下部署大模型,找到适合自己的项目/创业方向,做一名被 AI 武装的产品经理。
- 硬件选型
- 带你了解全球大模型
- 使用国产大模型服务
- 搭建 OpenAI 代理
- 热身:基于阿里云 PAI 部署 Stable Diffusion
- 在本地计算机运行大模型
- 大模型的私有化部署
- 基于 vLLM 部署大模型
- 案例:如何优雅地在阿里云私有部署开源大模型
- 部署一套开源 LLM 项目
- 内容安全
- 互联网信息服务算法备案
- …
学习是一个过程,只要学习就会有挑战。天道酬勤,你越努力,就会成为越优秀的自己。
如果你能在15天内完成所有的任务,那你堪称天才。然而,如果你能完成 60-70% 的内容,你就已经开始具备成为一名大模型 AI 的正确特征了。
这份完整版的大模型 AI 学习资料已经上传CSDN,朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费】

更多推荐


所有评论(0)