langchain如何检查llm生成cypher的正确性并修正错误

liliangcsdn

872人浏览 · 2025-11-21 11:31:52

liliangcsdn · 2025-11-21 11:31:52 发布

目前，llm已经具备强大的任务处理能力，已能通过langchain等框架处理相对比较复杂的问题。

之前介绍了langchain如何依赖prompt判断neo4j图谱能否回答用户问题。

https://blog.csdn.net/liliang199/article/details/155074227

这里在langchain通过多种手段分析模型生成的cypher，并尝试修正错误和重写cypher。

1 语法错误检查

neo4j提供了语法检测模块CypherSyntaxError，这里先通过CypherSyntaxError检查语法错误。

测试cypher如下所示，在标黄的Casino部分存在语法错误

MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;

语法检查代码示例如下。

from neo4j.exceptions import CypherSyntaxError

def neo4j_graph_validate(cypher):
    """
    Validates the Cypher statements and maps any property values to the database.
    """
    errors = []
    # Check for syntax errors
    try:
        enhanced_graph.query(f"EXPLAIN {cypher}")
    except CypherSyntaxError as e:
        errors.append(e.message)
    return errors
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
errors = neo4j_graph_validate(generated_cypher)
print(f"cypher syntax error: {errors}")

输出如下，CypherSyntaxError明确说明Casino存在语法错误。

cypher syntax error: ['Variable `Casino` not defined (line 1, column 65 (offset: 64))\n"EXPLAIN MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"\n ^']

2 schema匹配检查

除基础语法错误外，cypher定义的关联查询路径也有可能不匹配图谱。

2.1 CypherQueryCorrector

这里借助于langchain的CypherQueryCorrector模块，检查cpyher是否与图谱schema匹配。

以下是示例代码

from langchain_neo4j.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema

# Cypher query corrector is experimental
corrector_schema = [
    Schema(el["start"], el["type"], el["end"])
    for el in enhanced_graph.structured_schema.get("relationships")
]
cypher_query_corrector = CypherQueryCorrector(corrector_schema)

2.2 语法正确性测试

测试cypher输入如下，cypher查询匹配schema，但在Casino位置存在语法错误。

“MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;”

以下是测试代码

generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
corrected_cypher = cypher_query_corrector(generated_cypher)
if not corrected_cypher:
    print("The generated Cypher statement doesn't fit the graph schema")

输出为空，说明目前CypherQueryCorrector有可能发现不了cypher中存在语法错误。

2.3 schema匹配测试

再次修改输入，将关联边从ACTED_IN改为ACTED_INx，图谱schema中ACTED_INx边不存在。

“MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;”

再次运行代码，示例如下

generated_cypher = "MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
corrected_cypher = cypher_query_corrector(generated_cypher)
if not corrected_cypher:
    print("The generated Cypher statement doesn't fit the graph schema")

输出如下所示，说明CypherQueryCorrector识别出cypher中存在的匹配问题。

因为ACTED_INx与图谱schema定义的边不匹配。

The generated Cypher statement doesn't fit the graph schema

不足之处是CypherQueryCorrector没有给出具体的不匹配信息。

3 llm检查验证

不管是语法检测、还是schema匹配检车，虽然能发现cypher中存在的语法问题或匹配问题，但是无法确定cypher是否能真能回答用户问题，cypher中有没有忽略关爱点、边或关键过滤条件。

这里基于llm，通过prompt方式定义llm检测cypher有效性的规则，并实地进行正确性检查。

llm不仅能检查出语法错误，而且应该能检查出cypher中存在的逻辑或功能错误。

3.1 验证prompt

这里通过prompt方式定义llm检测cypher有效性的规则。

输入为: 问题question、图谱schema，验证llm生成的cypher的正确性。

llm针对输入，基于检测规则对cypher进行检查。

检测规则如下所示。

You must check the following:
* Are there any syntax errors in the Cypher statement?
* Are there any missing or undefined variables in the Cypher statement?
* Are any node labels missing from the schema?
* Are any relationship types missing from the schema?
* Are any of the properties not included in the schema?
* Does the Cypher statement include enough information to answer the question?

Examples of good errors:
* Label (:Foo) does not exist, did you mean (:Bar)?
* Property bar does not exist for label Foo, did you mean baz?
* Relationship FOO does not exist, did you mean FOO_BAR?

以下是检测cpyher的代码，其检测结果通过ValidateCypherOutput结构化。

from typing import List, Optional

validate_cypher_system = """
You are a Cypher expert reviewing a statement written by a junior developer.
"""

validate_cypher_user = """You must check the following:
* Are there any syntax errors in the Cypher statement?
* Are there any missing or undefined variables in the Cypher statement?
* Are any node labels missing from the schema?
* Are any relationship types missing from the schema?
* Are any of the properties not included in the schema?
* Does the Cypher statement include enough information to answer the question?

Examples of good errors:
* Label (:Foo) does not exist, did you mean (:Bar)?
* Property bar does not exist for label Foo, did you mean baz?
* Relationship FOO does not exist, did you mean FOO_BAR?

Schema:
{schema}

The question is:
{question}

The Cypher statement is:
{cypher}

Make sure you don't make any mistakes!"""

validate_cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            validate_cypher_system,
        ),
        (
            "human",
            (validate_cypher_user),
        ),
    ]
)


class Property(BaseModel):
    """
    Represents a filter condition based on a specific node property in a graph in a Cypher statement.
    """

    node_label: str = Field(
        description="The label of the node to which this property belongs."
    )
    property_key: str = Field(description="The key of the property being filtered.")
    property_value: str = Field(
        description="The value that the property is being matched against."
    )


class ValidateCypherOutput(BaseModel):
    """
    Represents the validation result of a Cypher query's output,
    including any errors and applied filters.
    """

    errors: Optional[List[str]] = Field(
        description="A list of syntax or semantical errors in the Cypher statement. Always explain the discrepancy between schema and Cypher statement"
    )
    filters: Optional[List[Property]] = Field(
        description="A list of property-based filters applied in the Cypher statement."
    )


validate_cypher_chain = validate_cypher_prompt | llm.with_structured_output(
    ValidateCypherOutput
)

3.2 无问题cypher

先输入没有问题的cypher进行检测，代码示例如下。

question = "What was the cast of the Casino?"
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = \'Casino\' RETURN p.name;"

validate_info = validate_cypher_chain.invoke(
        {
            "question": question,
            "cypher": generated_cypher,
            "schema": enhanced_graph.schema,
        }
    )
print(f"validate_info: {validate_info}")

输出如下，errors=None表示没有发现异常，filters则记录了cpyher中用到的property。

validate_info: errors=None filters=[Property(node_label='Movie', property_key='title', property_value='Casino')]

3.3 有问题cypher

然后输入有问题的cypher进行检测，错误位置在已标黄的Casino部分。

正确版本应该为"Casino"或'Casino'。

MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;

代码示例如下

question = "What was the cast of the Casino?"
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"

validate_info = validate_cypher_chain.invoke(
        {
            "question": question,
            "cypher": generated_cypher,
            "schema": enhanced_graph.schema,
        }
    )
print(f"validate_info: {validate_info}")

运行代码输出如下。

validate_cypher_chain识别出语法错误，并给出正确示例。

validate_info: errors=["Property value Casino in WHERE clause is missing quotes. Did you mean 'Casino'?", "Label (Casino) does not exist, did you mean to reference property m.title = 'Casino'?"] filters=[Property(node_label='Movie', property_key='title', property_value='Casino')]

4 cypher综合检查

将语法、schema、llm等多种手段集成在一个函数中，综合识别cypher存在的问题。

以下是示例代码

from neo4j.exceptions import CypherSyntaxError

def full_validate_cypher(question, cypher, enhanced_graph):
    """
    Validates the Cypher statements and maps any property values to the database.
    """
    errors = []
    mapping_errors = []
    # Check for syntax errors
    try:
        enhanced_graph.query(f"EXPLAIN {cypher}")
    except CypherSyntaxError as e:
        errors.append(e.message)
    # Experimental feature for correcting relationship directions
    corrected_cypher = cypher_query_corrector(cypher)
    if not corrected_cypher:
        errors.append("The generated Cypher statement doesn't fit the graph schema")
    if not corrected_cypher == cypher:
        print("Relationship direction was corrected")
    # Use LLM to find additional potential errors and get the mapping for values
    llm_output = validate_cypher_chain.invoke(
        {
            "question": question,
            "schema": enhanced_graph.schema,
            "cypher": cypher,
        }
    )
    if llm_output.errors:
        errors.extend(llm_output.errors)
    if llm_output.filters:
        for filter in llm_output.filters:
            # Do mapping only for string values
            if (
                not [
                    prop
                    for prop in enhanced_graph.structured_schema["node_props"][
                        filter.node_label
                    ]
                    if prop["property"] == filter.property_key
                ][0]["type"]
                == "STRING"
            ):
                continue
            mapping = enhanced_graph.query(
                f"MATCH (n:{filter.node_label}) WHERE toLower(n.`{filter.property_key}`) = toLower($value) RETURN 'yes' LIMIT 1",
                {"value": filter.property_value},
            )
            if not mapping:
                print(
                    f"Missing value mapping for {filter.node_label} on property {filter.property_key} with value {filter.property_value}"
                )
                mapping_errors.append(
                    f"Missing value mapping for {filter.node_label} on property {filter.property_key} with value {filter.property_value}"
                )
    if mapping_errors:
        next_action = "end"
    elif errors:
        next_action = "correct_cypher"
    else:
        next_action = "execute_cypher"

    return {
        "next_action": next_action,
        "cypher_statement": corrected_cypher,
        "cypher_errors": errors,
        "steps": ["validate_cypher"],
    }

question = "What was the cast of the Casino?"
generated_cypher = "MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"

res = full_validate_cypher(question, generated_cypher, enhanced_graph)

print(f"infos: {res}")

输出如下

infos: {'next_action': 'correct_cypher', 'cypher_statement': '', 'cypher_errors': ['Variable `Casino` not defined (line 1, column 66 (offset: 65))\n"EXPLAIN MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"\n ^', "The generated Cypher statement doesn't fit the graph schema", 'Relationship ACTED_INx does not exist, did you mean ACTED_IN?', "Unquoted string literal 'Casino' in WHERE clause (should be m.title = 'Casino')", "Variable 'Casino' in WHERE clause is undefined (did you mean to use a string literal?)"], 'steps': ['validate_cypher']}

5 cypher错误修正

之前介绍了多种识别cypher错误的方法，这里借助于llm，尝试对cypher进行修复和重写。

输入为用户问题、问题cypher、已发现的错误、以及图谱schema信息。

llm依据prompt定义的修正规则，将存在错误的cypher重写为一个正确的cypher。

以下是示例代码。

correct_cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            (
                "You are a Cypher expert reviewing a statement written by a junior developer. "
                "You need to correct the Cypher statement based on the provided errors. No pre-amble."
                "Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"
            ),
        ),
        (
            "human",
            (
                """Check for invalid syntax or semantics and return a corrected Cypher statement.

Schema:
{schema}

Note: Do not include any explanations or apologies in your responses.
Do not wrap the response in any backticks or anything else.
Respond with a Cypher statement only!

Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.

The question is:
{question}

The Cypher statement is:
{cypher}

The errors are:
{errors}

Corrected Cypher statement: """
            ),
        ),
    ]
)

correct_cypher_chain = correct_cypher_prompt | llm | StrOutputParser()


question = "What was the cast of the Casino?"
cypher = "MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
errors = {'next_action': 'correct_cypher', 'cypher_statement': '', 'cypher_errors': ['Variable `Casino` not defined (line 1, column 66 (offset: 65))\n"EXPLAIN MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"\n                                                                  ^', "The generated Cypher statement doesn't fit the graph schema", 'Relationship ACTED_INx does not exist, did you mean ACTED_IN?', "Unquoted string literal 'Casino' in WHERE clause (should be m.title = 'Casino')", "Variable 'Casino' in WHERE clause is undefined (did you mean to use a string literal?)"], 'steps': ['validate_cypher']}
schema = enhanced_graph.schema

corrected_cypher = correct_cypher_chain.invoke(
        {
            "question": question,
            "errors": errors,
            "cypher": cypher,
            "schema": schema,
        }
    )

print(f"corrected_cypher: {corrected_cypher}")

输出如下所示，llm有效修正了存在的错误，并正确重写cypher。

corrected_cypher: MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Casino' RETURN p.name;

reference

---

langchain如何判断neo4j知识图谱是否能回答问题

https://blog.csdn.net/liliang199/article/details/155074227

如何结合langchain、neo4j实现关联检索问答-续

https://blog.csdn.net/liliang199/article/details/153731597

如何结合langchain、neo4j实现关联检索问答

https://blog.csdn.net/liliang199/article/details/153687193

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

AI - CurSor精准上下文+应用（三）

可约束代码风格（如强制用驼峰命名、要求函数必须写注释）能限定技术选型（如禁止使用某老旧库、优先用项目指定工具类）提前指定核心参数（如提前设置连接数据库的地址和账号密码等）Rule主要的配置方案有两种：维度项目规则（Project Rules）用户规则（User Rules）作用范围仅对当前项目生效，团队成员共享相同规则对所有项目生效，个人专属配置存储位置项目根目录下的.cursor/rules

2048 AI社区

JavaScript 编年史：探索前端界巨变的幕后推手

然而，作为在企业一线构建、部署和维护复杂系统的实践者，我们深知，一个卓越的模型，本身并不能构成一个成功的企业级解决方案。AI 系统，特别是智能体 (Agent)，与数据的关系是持续的、双向的、对话式的。我们正站在一个激动人心的技术变革的门槛上。它不再是一个滞后的、审计驱动的合规流程，而必须是一个主动的、嵌入在数据流中的实时机制。它能根据模糊的目标（例如，“帮用户解决订单发货延迟的问题”）自主地规划