langchain如何判断neo4j知识图谱是否能回答问题

由于neo4j图谱局限性，图谱不能回答所有问题，如电影图谱，可能回答不了天气、经济类问题。另外图谱自身并不能判断否则回答问题，这里尝试基于langchain判断图谱能否回答问题。实验所用代码参考和修改自网络资料。

liliangcsdn

532人浏览 · 2025-11-20 23:15:00

liliangcsdn · 2025-11-20 23:15:00 发布

由于neo4j图谱局限性，图谱不能回答所有问题，如电影图谱，可能回答不了天气、经济类问题。

另外图谱自身并不能判断否则回答问题，这里尝试基于langchain判断图谱能否回答问题。

实验所用代码参考和修改自网络资料。

1 环境构建

1.1 环境准备

在验证前需要安装langchain和neo4j，这里假设langchain、neo4j、apoc已安装，具体过程参考

https://blog.csdn.net/liliang199/article/details/153687193

1.2 数据导入

测试数据来源于blog-datasets的movies_small.csv，连接如下。

https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv

由于github访问受限，下载movies_small.csv后使用minso创建一个本地的下载链接，假设为

http://host_ip:9000/tomasonjo.blog-datasets/movies/movies_small.csv

使用langchain neo4j导入数据，代码示例如下所示，运行即可倒入movies_small数据集。

from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph()

# Import movie information

movies_query = """
LOAD CSV WITH HEADERS FROM 
'http://host_ip:9000/tomasonjo.blog-datasets/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

graph.query(movies_query)

1.3 图谱验证

运行如下代码输出图谱的schema。

graph.refresh_schema()
print(graph.schema)

输出如下所示

Node properties:
Person {name: STRING}
Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING}
Genre {name: STRING}
Relationship properties:

The relationships:
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)

这里进一步运行如下代码，输出详细版本的图谱schema信息。

enhanced_graph = Neo4jGraph(enhanced_schema=True)
print(enhanced_graph.schema)

详细版本的图谱schema信息如下所示，出节点和变的property外，还有property特征示例。

Node properties:
- **Person**
- `name`: STRING Example: "John Lasseter"
- **Movie**
- `imdbRating`: FLOAT Min: 2.4, Max: 9.3
- `id`: STRING Example: "1"
- `released`: DATE Min: 1964-12-16, Max: 1996-09-15
- `title`: STRING Example: "Toy Story"
- **Genre**
- `name`: STRING Example: "Adventure"
Relationship properties:

The relationships:
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)
Selection deleted

2 验证功能

这里基于deepseek-r1和langchain，示例如何基于prompt判断neo4j知识图谱是否可以回答问题。

2.1 设置大模型

首先是基于langchain设置大模型，这里采用deepseek-r1，示例代码如下所示。

import os
os.environ['OPENAI_API_KEY'] = "sk-xxxxx"
os.environ['OPENAI_BASE_URL'] = "http://llm_provider_url/v1"


from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="deepseek-r1", temperature=0)

2.2 设置问题守卫

参考网络案例，这里通过prompt，让llm充当判断neo4j是否可以回答问题的守卫。

因为数据库与movie有关，这里让llm输出movie和end来指示是否能回答问题。

如果输出movie，则说明问题与电影知识库有关，知识库有可能回答。

如果输出end，说明问题与电影知识库无关，知识库不能回答。

示例代码如下所示。

from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

guardrails_system = """
As an intelligent assistant, your primary objective is to decide whether a given question is related to movies or not. 
If the question is related to movies, output "movie". Otherwise, output "end".
To make this decision, assess the content of the question and determine if it refers to any movie, actor, director, film industry, 
or related topics. Provide only the specified output: "movie" or "end".
"""
guardrails_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            guardrails_system,
        ),
        (
            "human",
            ("{question}"),
        ),
    ]
)


class GuardrailsOutput(BaseModel):
    decision: Literal["movie", "end"] = Field(
        description="Decision on whether the question is related to movies"
    )


guardrails_chain = guardrails_prompt | llm.with_structured_output(GuardrailsOutput)


question = "What was the cast of the Casino?"

checked = guardrails_chain.invoke(
        {
            "question": question,
        }
    )
print(f"checked: {checked}")

输出如下，

针对电影相关的问题 "What was the cast of the Casino?"，llm精确输出“movie”。

这说明neo4j有可能回答这个问题。

checked: decision='movie'

2.3 生成cypher

由于neo4j无法直接回答问题，所以需要通过llm先生成cypher，然后运行cypher。

目前基于llm的cypher生成不太成熟，这里通过fewshot方式，强化llm生成cypher的能力。

以下是一些电影图谱的常见问题和对应cypher查询。

这里通过bge-m3向量化这些图谱常见问题。

from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_neo4j import Neo4jVector
from langchain_ollama import OllamaEmbeddings


examples = [
    {
        "question": "How many artists are there?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",
    },
    {
        "question": "Which actors played in the movie Casino?",
        "query": "MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name",
    },
    {
        "question": "How many movies has Tom Hanks acted in?",
        "query": "MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)",
    },
    {
        "question": "List all the genres of the movie Schindler's List",
        "query": "MATCH (m:Movie {title: 'Schindler's List'})-[:IN_GENRE]->(g:Genre) RETURN g.name",
    },
    {
        "question": "Which actors have worked in movies from both the comedy and action genres?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name",
    },
    {
        "question": "Which directors have made movies with at least three different actors named 'John'?",
        "query": "MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name",
    },
    {
        "question": "Identify movies where directors also played a role in the film.",
        "query": "MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name",
    },
    {
        "question": "Find the actor with the highest number of movies in the database.",
        "query": "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1",
    },
]

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples, OllamaEmbeddings(base_url="http://localhost:11434", model="bge-m3"), Neo4jVector, k=5, input_keys=["question"]
)

在运行时通过向量检索给出与用户问题最相关的fewshot_examples。

先基于fewshot_examples和question组成text2cypher_prompt，然后提交llm运行。

依据prompt示例，llm应直接输出用于回答用户问题的cypher。

以下是cypher生成代码示例。

from langchain_core.output_parsers import StrOutputParser

text2cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            (
                "Given an input question, convert it to a Cypher query. No pre-amble."
                "Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"
            ),
        ),
        (
            "human",
            (
                """You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.
Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!
Here is the schema information
{schema}

Below are a number of examples of questions and their corresponding Cypher queries.

{fewshot_examples}

User input: {question}
Cypher query:"""
            ),
        ),
    ]
)

text2cypher_chain = text2cypher_prompt | llm | StrOutputParser()


question = "What was the cast of the Casino?"


NL = "\n"
fewshot_examples = (NL * 2).join(
        [
            f"Question: {el['question']}{NL}Cypher:{el['query']}"
            for el in example_selector.select_examples(
                {"question": question}
            )
        ]
    )
generated_cypher = text2cypher_chain.invoke(
        {
            "question": question,
            "fewshot_examples": fewshot_examples,
            "schema": enhanced_graph.schema,
        }
    )
print(f"generated_cypher: {generated_cypher}")

以下是llm生成的cypher。

generated_cypher: MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: "Casino"}) RETURN p.name;

2.4 运行cypher

在生成cypher后，通过neo4j对象enhanced_graph运行cypher查询，示例代码如下。

print(generated_cypher)
records = enhanced_graph.query(generated_cypher)
print(records)

以下是neo4j对象neo4j对象enhanced_graph输出的查询结果。

MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = "Casino" RETURN p.name;
[{'p.name': 'James Woods'}, {'p.name': 'Joe Pesci'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}]

2.5 回答问题

enhanced_graph输出的仅是结构化的节点或边的数据，大部分情况下用户可能看不懂。

这里还需要llm进一步将enhanced_graph输出的结果，转化为用户能看懂的文字。

以下是实现这一转化功能的prompt和代码示例。

generate_final_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant",
        ),
        (
            "human",
            (
                """Use the following results retrieved from a database to provide
a succinct, definitive answer to the user's question.

Respond as if you are answering the question directly.

Results: {results}
Question: {question}"""
            ),
        ),
    ]
)

generate_final_chain = generate_final_prompt | llm | StrOutputParser()

final_answer = generate_final_chain.invoke(
        {"question": question, "results": records}
    )

print(f"final_answer: {final_answer}")

llm输出如下所示，相比neo4j直接输出，就比较容易理解了。

final_answer: The cast of *Casino* included **Robert De Niro**, **Joe Pesci**, **Sharon Stone**, and **James Woods**.

3 Chain示例

之前采用拆分方式，示例langchain基于llm判断neo4j能否回答问题，以及实际回答问题的过程。

其实langchain提供了更集成化的链GraphCypherQAChain，2-3行代码即可实现类似功能。

代码示例如下所示。

import os
os.environ['OPENAI_API_KEY'] = "sk-xxxxx"
os.environ['OPENAI_BASE_URL'] = "http://llm_provider_url/v1"


from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="deepseek-r1", temperature=0)
chain = GraphCypherQAChain.from_llm(
    graph=enhanced_graph, llm=llm, verbose=True, allow_dangerous_requests=True
)

query = "What was the cast of the Casino?"

response = chain.invoke({"query": query})
response

以下是代码的运行示例，GraphCypherQAChain输出了回答问题对应的cpher，neo4j运行cypher的结果，以及llm整理后的最终答案。

> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: "Casino"}) RETURN p.name;
Full Context:
[{'p.name': 'James Woods'}, {'p.name': 'Joe Pesci'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}]

> Finished chain.
{'query': 'What was the cast of the Casino?',
'result': 'The cast of Casino includes Robert De Niro, Joe Pesci, Sharon Stone, and James Woods.'}

reference

---

如何结合langchain、neo4j实现关联检索问答-续

https://blog.csdn.net/liliang199/article/details/153731597

如何结合langchain、neo4j实现关联检索问答

https://blog.csdn.net/liliang199/article/details/153687193

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

一套 Cocos 旧工程的资源、代码、协议应该怎么系统化梳理

2048 AI社区

Model Context Protocol (MCP) 技术详解与 Spring AI 集成

知其然：MCP是AI模型与外部世界高效交互的标准协议，架构清晰、传输灵活。知其所以然：解决了多模型、多工具协作的标准化瓶颈，提升了AI应用开发的可维护性和扩展性。Spring AI集成：极大降低了MCP应用门槛，支持注解式开发和多种通信模式，适合企业级场景。随着 AI 技术的深入，MCP 作为模型与外部工具/资源的桥梁，正逐步成为 AI 应用开发的基础设施。结合 Spring AI 的 Boot