langchain如何判断neo4j知识图谱是否能回答问题
由于neo4j图谱局限性,图谱不能回答所有问题,如电影图谱,可能回答不了天气、经济类问题。另外图谱自身并不能判断否则回答问题,这里尝试基于langchain判断图谱能否回答问题。实验所用代码参考和修改自网络资料。
由于neo4j图谱局限性,图谱不能回答所有问题,如电影图谱,可能回答不了天气、经济类问题。
另外图谱自身并不能判断否则回答问题,这里尝试基于langchain判断图谱能否回答问题。
实验所用代码参考和修改自网络资料。
1 环境构建
1.1 环境准备
在验证前需要安装langchain和neo4j,这里假设langchain、neo4j、apoc已安装,具体过程参考
https://blog.csdn.net/liliang199/article/details/153687193
1.2 数据导入
测试数据来源于blog-datasets的movies_small.csv,连接如下。
https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv
由于github访问受限,下载movies_small.csv后使用minso创建一个本地的下载链接,假设为
http://host_ip:9000/tomasonjo.blog-datasets/movies/movies_small.csv
使用langchain neo4j导入数据,代码示例如下所示,运行即可倒入movies_small数据集。
from langchain_neo4j import Neo4jGraph
graph = Neo4jGraph()
# Import movie information
movies_query = """
LOAD CSV WITH HEADERS FROM
'http://host_ip:9000/tomasonjo.blog-datasets/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
m.title = row.title,
m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
MERGE (p:Person {name:trim(director)})
MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
MERGE (p:Person {name:trim(actor)})
MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
MERGE (g:Genre {name:trim(genre)})
MERGE (m)-[:IN_GENRE]->(g))
"""
graph.query(movies_query)
1.3 图谱验证
运行如下代码输出图谱的schema。
graph.refresh_schema()
print(graph.schema)
输出如下所示
Node properties:
Person {name: STRING}
Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING}
Genre {name: STRING}
Relationship properties:The relationships:
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)
这里进一步运行如下代码,输出详细版本的图谱schema信息。
enhanced_graph = Neo4jGraph(enhanced_schema=True)
print(enhanced_graph.schema)
详细版本的图谱schema信息如下所示,出节点和变的property外,还有property特征示例。
Node properties:
- **Person**
- `name`: STRING Example: "John Lasseter"
- **Movie**
- `imdbRating`: FLOAT Min: 2.4, Max: 9.3
- `id`: STRING Example: "1"
- `released`: DATE Min: 1964-12-16, Max: 1996-09-15
- `title`: STRING Example: "Toy Story"
- **Genre**
- `name`: STRING Example: "Adventure"
Relationship properties:The relationships:
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)
Selection deleted
2 验证功能
这里基于deepseek-r1和langchain,示例如何基于prompt判断neo4j知识图谱是否可以回答问题。
2.1 设置大模型
首先是基于langchain设置大模型,这里采用deepseek-r1,示例代码如下所示。
import os
os.environ['OPENAI_API_KEY'] = "sk-xxxxx"
os.environ['OPENAI_BASE_URL'] = "http://llm_provider_url/v1"
from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="deepseek-r1", temperature=0)
2.2 设置问题守卫
参考网络案例,这里通过prompt,让llm充当判断neo4j是否可以回答问题的守卫。
因为数据库与movie有关,这里让llm输出movie和end来指示是否能回答问题。
如果输出movie,则说明问题与电影知识库有关,知识库有可能回答。
如果输出end,说明问题与电影知识库无关,知识库不能回答。
示例代码如下所示。
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
guardrails_system = """
As an intelligent assistant, your primary objective is to decide whether a given question is related to movies or not.
If the question is related to movies, output "movie". Otherwise, output "end".
To make this decision, assess the content of the question and determine if it refers to any movie, actor, director, film industry,
or related topics. Provide only the specified output: "movie" or "end".
"""
guardrails_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
guardrails_system,
),
(
"human",
("{question}"),
),
]
)
class GuardrailsOutput(BaseModel):
decision: Literal["movie", "end"] = Field(
description="Decision on whether the question is related to movies"
)
guardrails_chain = guardrails_prompt | llm.with_structured_output(GuardrailsOutput)
question = "What was the cast of the Casino?"
checked = guardrails_chain.invoke(
{
"question": question,
}
)
print(f"checked: {checked}")
输出如下,
针对电影相关的问题 "What was the cast of the Casino?",llm精确输出“movie”。
这说明neo4j有可能回答这个问题。
checked: decision='movie'
2.3 生成cypher
由于neo4j无法直接回答问题,所以需要通过llm先生成cypher,然后运行cypher。
目前基于llm的cypher生成不太成熟,这里通过fewshot方式,强化llm生成cypher的能力。
以下是一些电影图谱的常见问题和对应cypher查询。
这里通过bge-m3向量化这些图谱常见问题。
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_neo4j import Neo4jVector
from langchain_ollama import OllamaEmbeddings
examples = [
{
"question": "How many artists are there?",
"query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",
},
{
"question": "Which actors played in the movie Casino?",
"query": "MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name",
},
{
"question": "How many movies has Tom Hanks acted in?",
"query": "MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)",
},
{
"question": "List all the genres of the movie Schindler's List",
"query": "MATCH (m:Movie {title: 'Schindler's List'})-[:IN_GENRE]->(g:Genre) RETURN g.name",
},
{
"question": "Which actors have worked in movies from both the comedy and action genres?",
"query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name",
},
{
"question": "Which directors have made movies with at least three different actors named 'John'?",
"query": "MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name",
},
{
"question": "Identify movies where directors also played a role in the film.",
"query": "MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name",
},
{
"question": "Find the actor with the highest number of movies in the database.",
"query": "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1",
},
]
example_selector = SemanticSimilarityExampleSelector.from_examples(
examples, OllamaEmbeddings(base_url="http://localhost:11434", model="bge-m3"), Neo4jVector, k=5, input_keys=["question"]
)
在运行时通过向量检索给出与用户问题最相关的fewshot_examples。
先基于fewshot_examples和question组成text2cypher_prompt,然后提交llm运行。
依据prompt示例,llm应直接输出用于回答用户问题的cypher。
以下是cypher生成代码示例。
from langchain_core.output_parsers import StrOutputParser
text2cypher_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
(
"Given an input question, convert it to a Cypher query. No pre-amble."
"Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"
),
),
(
"human",
(
"""You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.
Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!
Here is the schema information
{schema}
Below are a number of examples of questions and their corresponding Cypher queries.
{fewshot_examples}
User input: {question}
Cypher query:"""
),
),
]
)
text2cypher_chain = text2cypher_prompt | llm | StrOutputParser()
question = "What was the cast of the Casino?"
NL = "\n"
fewshot_examples = (NL * 2).join(
[
f"Question: {el['question']}{NL}Cypher:{el['query']}"
for el in example_selector.select_examples(
{"question": question}
)
]
)
generated_cypher = text2cypher_chain.invoke(
{
"question": question,
"fewshot_examples": fewshot_examples,
"schema": enhanced_graph.schema,
}
)
print(f"generated_cypher: {generated_cypher}")
以下是llm生成的cypher。
generated_cypher: MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: "Casino"}) RETURN p.name;
2.4 运行cypher
在生成cypher后,通过neo4j对象enhanced_graph运行cypher查询,示例代码如下。
print(generated_cypher)
records = enhanced_graph.query(generated_cypher)
print(records)
以下是neo4j对象neo4j对象enhanced_graph输出的查询结果。
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = "Casino" RETURN p.name;
[{'p.name': 'James Woods'}, {'p.name': 'Joe Pesci'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}]
2.5 回答问题
enhanced_graph输出的仅是结构化的节点或边的数据,大部分情况下用户可能看不懂。
这里还需要llm进一步将enhanced_graph输出的结果,转化为用户能看懂的文字。
以下是实现这一转化功能的prompt和代码示例。
generate_final_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant",
),
(
"human",
(
"""Use the following results retrieved from a database to provide
a succinct, definitive answer to the user's question.
Respond as if you are answering the question directly.
Results: {results}
Question: {question}"""
),
),
]
)
generate_final_chain = generate_final_prompt | llm | StrOutputParser()
final_answer = generate_final_chain.invoke(
{"question": question, "results": records}
)
print(f"final_answer: {final_answer}")
llm输出如下所示,相比neo4j直接输出,就比较容易理解了。
final_answer: The cast of *Casino* included **Robert De Niro**, **Joe Pesci**, **Sharon Stone**, and **James Woods**.
3 Chain示例
之前采用拆分方式,示例langchain基于llm判断neo4j能否回答问题,以及实际回答问题的过程。
其实langchain提供了更集成化的链GraphCypherQAChain,2-3行代码即可实现类似功能。
代码示例如下所示。
import os
os.environ['OPENAI_API_KEY'] = "sk-xxxxx"
os.environ['OPENAI_BASE_URL'] = "http://llm_provider_url/v1"
from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="deepseek-r1", temperature=0)
chain = GraphCypherQAChain.from_llm(
graph=enhanced_graph, llm=llm, verbose=True, allow_dangerous_requests=True
)
query = "What was the cast of the Casino?"
response = chain.invoke({"query": query})
response
以下是代码的运行示例,GraphCypherQAChain输出了回答问题对应的cpher,neo4j运行cypher的结果,以及llm整理后的最终答案。
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: "Casino"}) RETURN p.name;
Full Context:
[{'p.name': 'James Woods'}, {'p.name': 'Joe Pesci'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}]> Finished chain.
{'query': 'What was the cast of the Casino?',
'result': 'The cast of Casino includes Robert De Niro, Joe Pesci, Sharon Stone, and James Woods.'}
reference
---
如何结合langchain、neo4j实现关联检索问答-续
https://blog.csdn.net/liliang199/article/details/153731597
如何结合langchain、neo4j实现关联检索问答
更多推荐


所有评论(0)