langchain如何检查llm生成cypher的正确性并修正错误
目前,llm已经具备强大的任务处理能力,已能通过langchain等框架处理相对比较复杂的问题。
之前介绍了langchain如何依赖prompt判断neo4j图谱能否回答用户问题。
https://blog.csdn.net/liliang199/article/details/155074227
这里在langchain通过多种手段分析模型生成的cypher,并尝试修正错误和重写cypher。
1 语法错误检查
neo4j提供了语法检测模块CypherSyntaxError,这里先通过CypherSyntaxError检查语法错误。
测试cypher如下所示,在标黄的Casino部分存在语法错误
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;
语法检查代码示例如下。
from neo4j.exceptions import CypherSyntaxError
def neo4j_graph_validate(cypher):
"""
Validates the Cypher statements and maps any property values to the database.
"""
errors = []
# Check for syntax errors
try:
enhanced_graph.query(f"EXPLAIN {cypher}")
except CypherSyntaxError as e:
errors.append(e.message)
return errors
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
errors = neo4j_graph_validate(generated_cypher)
print(f"cypher syntax error: {errors}")
输出如下,CypherSyntaxError明确说明Casino存在语法错误。
cypher syntax error: ['Variable `Casino` not defined (line 1, column 65 (offset: 64))\n"EXPLAIN MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"\n ^']
2 schema匹配检查
除基础语法错误外,cypher定义的关联查询路径也有可能不匹配图谱。
2.1 CypherQueryCorrector
这里借助于langchain的CypherQueryCorrector模块,检查cpyher是否与图谱schema匹配。
以下是示例代码
from langchain_neo4j.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema
# Cypher query corrector is experimental
corrector_schema = [
Schema(el["start"], el["type"], el["end"])
for el in enhanced_graph.structured_schema.get("relationships")
]
cypher_query_corrector = CypherQueryCorrector(corrector_schema)
2.2 语法正确性测试
测试cypher输入如下,cypher查询匹配schema,但在Casino位置存在语法错误。
“MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;”
以下是测试代码
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
corrected_cypher = cypher_query_corrector(generated_cypher)
if not corrected_cypher:
print("The generated Cypher statement doesn't fit the graph schema")
输出为空,说明目前CypherQueryCorrector有可能发现不了cypher中存在语法错误。
2.3 schema匹配测试
再次修改输入,将关联边从ACTED_IN改为ACTED_INx,图谱schema中ACTED_INx边不存在。
“MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;”
再次运行代码,示例如下
generated_cypher = "MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
corrected_cypher = cypher_query_corrector(generated_cypher)
if not corrected_cypher:
print("The generated Cypher statement doesn't fit the graph schema")
输出如下所示,说明CypherQueryCorrector识别出cypher中存在的匹配问题。
因为ACTED_INx与图谱schema定义的边不匹配。
The generated Cypher statement doesn't fit the graph schema
不足之处是CypherQueryCorrector没有给出具体的不匹配信息。
3 llm检查验证
不管是语法检测、还是schema匹配检车,虽然能发现cypher中存在的语法问题或匹配问题,但是无法确定cypher是否能真能回答用户问题,cypher中有没有忽略关爱点、边或关键过滤条件。
这里基于llm,通过prompt方式定义llm检测cypher有效性的规则,并实地进行正确性检查。
llm不仅能检查出语法错误,而且应该能检查出cypher中存在的逻辑或功能错误。
3.1 验证prompt
这里通过prompt方式定义llm检测cypher有效性的规则。
输入为: 问题question、图谱schema,验证llm生成的cypher的正确性。
llm针对输入,基于检测规则对cypher进行检查。
检测规则如下所示。
You must check the following:
* Are there any syntax errors in the Cypher statement?
* Are there any missing or undefined variables in the Cypher statement?
* Are any node labels missing from the schema?
* Are any relationship types missing from the schema?
* Are any of the properties not included in the schema?
* Does the Cypher statement include enough information to answer the question?Examples of good errors:
* Label (:Foo) does not exist, did you mean (:Bar)?
* Property bar does not exist for label Foo, did you mean baz?
* Relationship FOO does not exist, did you mean FOO_BAR?
以下是检测cpyher的代码,其检测结果通过ValidateCypherOutput结构化。
from typing import List, Optional
validate_cypher_system = """
You are a Cypher expert reviewing a statement written by a junior developer.
"""
validate_cypher_user = """You must check the following:
* Are there any syntax errors in the Cypher statement?
* Are there any missing or undefined variables in the Cypher statement?
* Are any node labels missing from the schema?
* Are any relationship types missing from the schema?
* Are any of the properties not included in the schema?
* Does the Cypher statement include enough information to answer the question?
Examples of good errors:
* Label (:Foo) does not exist, did you mean (:Bar)?
* Property bar does not exist for label Foo, did you mean baz?
* Relationship FOO does not exist, did you mean FOO_BAR?
Schema:
{schema}
The question is:
{question}
The Cypher statement is:
{cypher}
Make sure you don't make any mistakes!"""
validate_cypher_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
validate_cypher_system,
),
(
"human",
(validate_cypher_user),
),
]
)
class Property(BaseModel):
"""
Represents a filter condition based on a specific node property in a graph in a Cypher statement.
"""
node_label: str = Field(
description="The label of the node to which this property belongs."
)
property_key: str = Field(description="The key of the property being filtered.")
property_value: str = Field(
description="The value that the property is being matched against."
)
class ValidateCypherOutput(BaseModel):
"""
Represents the validation result of a Cypher query's output,
including any errors and applied filters.
"""
errors: Optional[List[str]] = Field(
description="A list of syntax or semantical errors in the Cypher statement. Always explain the discrepancy between schema and Cypher statement"
)
filters: Optional[List[Property]] = Field(
description="A list of property-based filters applied in the Cypher statement."
)
validate_cypher_chain = validate_cypher_prompt | llm.with_structured_output(
ValidateCypherOutput
)
3.2 无问题cypher
先输入没有问题的cypher进行检测,代码示例如下。
question = "What was the cast of the Casino?"
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = \'Casino\' RETURN p.name;"
validate_info = validate_cypher_chain.invoke(
{
"question": question,
"cypher": generated_cypher,
"schema": enhanced_graph.schema,
}
)
print(f"validate_info: {validate_info}")
输出如下,errors=None表示没有发现异常,filters则记录了cpyher中用到的property。
validate_info: errors=None filters=[Property(node_label='Movie', property_key='title', property_value='Casino')]
3.3 有问题cypher
然后输入有问题的cypher进行检测,错误位置在已标黄的Casino部分。
正确版本应该为"Casino"或'Casino'。
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;
代码示例如下
question = "What was the cast of the Casino?"
generated_cypher = "MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
validate_info = validate_cypher_chain.invoke(
{
"question": question,
"cypher": generated_cypher,
"schema": enhanced_graph.schema,
}
)
print(f"validate_info: {validate_info}")
运行代码输出如下。
validate_cypher_chain识别出语法错误,并给出正确示例。
validate_info: errors=["Property value Casino in WHERE clause is missing quotes. Did you mean 'Casino'?", "Label (Casino) does not exist, did you mean to reference property m.title = 'Casino'?"] filters=[Property(node_label='Movie', property_key='title', property_value='Casino')]
4 cypher综合检查
将语法、schema、llm等多种手段集成在一个函数中,综合识别cypher存在的问题。
以下是示例代码
from neo4j.exceptions import CypherSyntaxError
def full_validate_cypher(question, cypher, enhanced_graph):
"""
Validates the Cypher statements and maps any property values to the database.
"""
errors = []
mapping_errors = []
# Check for syntax errors
try:
enhanced_graph.query(f"EXPLAIN {cypher}")
except CypherSyntaxError as e:
errors.append(e.message)
# Experimental feature for correcting relationship directions
corrected_cypher = cypher_query_corrector(cypher)
if not corrected_cypher:
errors.append("The generated Cypher statement doesn't fit the graph schema")
if not corrected_cypher == cypher:
print("Relationship direction was corrected")
# Use LLM to find additional potential errors and get the mapping for values
llm_output = validate_cypher_chain.invoke(
{
"question": question,
"schema": enhanced_graph.schema,
"cypher": cypher,
}
)
if llm_output.errors:
errors.extend(llm_output.errors)
if llm_output.filters:
for filter in llm_output.filters:
# Do mapping only for string values
if (
not [
prop
for prop in enhanced_graph.structured_schema["node_props"][
filter.node_label
]
if prop["property"] == filter.property_key
][0]["type"]
== "STRING"
):
continue
mapping = enhanced_graph.query(
f"MATCH (n:{filter.node_label}) WHERE toLower(n.`{filter.property_key}`) = toLower($value) RETURN 'yes' LIMIT 1",
{"value": filter.property_value},
)
if not mapping:
print(
f"Missing value mapping for {filter.node_label} on property {filter.property_key} with value {filter.property_value}"
)
mapping_errors.append(
f"Missing value mapping for {filter.node_label} on property {filter.property_key} with value {filter.property_value}"
)
if mapping_errors:
next_action = "end"
elif errors:
next_action = "correct_cypher"
else:
next_action = "execute_cypher"
return {
"next_action": next_action,
"cypher_statement": corrected_cypher,
"cypher_errors": errors,
"steps": ["validate_cypher"],
}
question = "What was the cast of the Casino?"
generated_cypher = "MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
res = full_validate_cypher(question, generated_cypher, enhanced_graph)
print(f"infos: {res}")
输出如下
infos: {'next_action': 'correct_cypher', 'cypher_statement': '', 'cypher_errors': ['Variable `Casino` not defined (line 1, column 66 (offset: 65))\n"EXPLAIN MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"\n ^', "The generated Cypher statement doesn't fit the graph schema", 'Relationship ACTED_INx does not exist, did you mean ACTED_IN?', "Unquoted string literal 'Casino' in WHERE clause (should be m.title = 'Casino')", "Variable 'Casino' in WHERE clause is undefined (did you mean to use a string literal?)"], 'steps': ['validate_cypher']}
5 cypher错误修正
之前介绍了多种识别cypher错误的方法,这里借助于llm,尝试对cypher进行修复和重写。
输入为用户问题、问题cypher、已发现的错误、以及图谱schema信息。
llm依据prompt定义的修正规则,将存在错误的cypher重写为一个正确的cypher。
以下是示例代码。
correct_cypher_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
(
"You are a Cypher expert reviewing a statement written by a junior developer. "
"You need to correct the Cypher statement based on the provided errors. No pre-amble."
"Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"
),
),
(
"human",
(
"""Check for invalid syntax or semantics and return a corrected Cypher statement.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not wrap the response in any backticks or anything else.
Respond with a Cypher statement only!
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
The question is:
{question}
The Cypher statement is:
{cypher}
The errors are:
{errors}
Corrected Cypher statement: """
),
),
]
)
correct_cypher_chain = correct_cypher_prompt | llm | StrOutputParser()
question = "What was the cast of the Casino?"
cypher = "MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"
errors = {'next_action': 'correct_cypher', 'cypher_statement': '', 'cypher_errors': ['Variable `Casino` not defined (line 1, column 66 (offset: 65))\n"EXPLAIN MATCH (p:Person)-[:ACTED_INx]->(m:Movie) WHERE m.title = Casino RETURN p.name;"\n ^', "The generated Cypher statement doesn't fit the graph schema", 'Relationship ACTED_INx does not exist, did you mean ACTED_IN?', "Unquoted string literal 'Casino' in WHERE clause (should be m.title = 'Casino')", "Variable 'Casino' in WHERE clause is undefined (did you mean to use a string literal?)"], 'steps': ['validate_cypher']}
schema = enhanced_graph.schema
corrected_cypher = correct_cypher_chain.invoke(
{
"question": question,
"errors": errors,
"cypher": cypher,
"schema": schema,
}
)
print(f"corrected_cypher: {corrected_cypher}")
输出如下所示,llm有效修正了存在的错误,并正确重写cypher。
corrected_cypher: MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Casino' RETURN p.name;
reference
---
langchain如何判断neo4j知识图谱是否能回答问题
https://blog.csdn.net/liliang199/article/details/155074227
如何结合langchain、neo4j实现关联检索问答-续
https://blog.csdn.net/liliang199/article/details/153731597
如何结合langchain、neo4j实现关联检索问答
更多推荐

所有评论(0)