高通 Gen AI 推理扩展 (GENIE)(3):前缀匹配和查询
KV$ Rewind
KV$ 倒带/KV$ 前缀匹配功能允许利用以前缓存的 KV 值来进行高效的查询处理。使用 KV Rewind 时,Genie 可以重用先前查询中的 KV 缓存值,以加快新的类似查询的处理速度。这在新查询与前一个查询共享公共前缀的情况下特别有用。
在查询之间使用 KV$ Rewind
typedef enum {
/// The string is the entire query/response.
GENIE_DIALOG_SENTENCE_COMPLETE = 0,
/// The string is the beginning of the query/response.
GENIE_DIALOG_SENTENCE_BEGIN = 1,
/// The string is a part of the query/response and not the beginning or end.
GENIE_DIALOG_SENTENCE_CONTINUE = 2,
/// The string is the end of the query/response.
GENIE_DIALOG_SENTENCE_END = 3,
/// The query has been aborted.
GENIE_DIALOG_SENTENCE_ABORT = 4,
///Rewind the KV cache as per prefix query match before processing the query
GENIE_DIALOG_SENTENCE_REWIND = 5,
} GenieDialog_SentenceCode_t;
GENIE_API
Genie_Status_t GenieDialog_query(const GenieDialog_Handle_t dialogHandle,
const char* queryStr,
const GenieDialog_SentenceCode_t sentenceCode,
const GenieDialog_QueryCallback_t callback,
const void* userData);
使用句子代码 GENIE_DIALOG_SENTENCE_REWIND 并像普通查询一样传递查询字符串。API 将在内部处理前缀匹配和KV$ Rewind
注意
KV$ 前缀匹配与 KV 更新方法 SMART_MASK 配合得很好。然而,使用 KV 更新方法 POINTER_SHIFT,我们观察到在少数情况下,它会为权重共享 bin 抛出与内存寄存器相关的错误。POINTER_SHIFT 工作正常或对仅解码器型号(AR1 / AR8 / AR128 等)没有问题。
在 genie-t2t-run 中,我们可以使用 ‘-w’ 选项进行倒带查询。
例如:
./genie-t2t-run -c llama2-7b-htp.json
-p "Answer in one sentence, what is the capital city of India?"
-w "Answer in one sentence, what is the capital city of Russia?"
GenieDialog_setStopSequence API
用户最初可以在对话框配置中设置停止序列列表。除此之外,Genie 还提供了在运行时动态更新停止序列的功能。
实现此目的的 API:
/**
* @brief A function to set/update the list of stop sequences of a dialog. Old sequences if any are discarded.
* Call with a nullptr or an empty JSON or an empty string to reset to no stop sequence.
*
* @param[in] dialogHandle A dialog handle.
*
* @param[in] newStopSequences A JSON string with list of new stop sequences. Must be null terminated.
*
* @return Status code:
* - GENIE_STATUS_SUCCESS: API call was successful.
* - GENIE_STATUS_ERROR_INVALID_HANDLE: Dialog handle is invalid.
* - GENIE_STATUS_ERROR_JSON_FORMAT: JSON string is invalid.
*/
GENIE_API
Genie_Status_t GenieDialog_setStopSequence(const GenieDialog_Handle_t dialogHandle,
const char* newStopSequences);
有关如何在查询之间更新停止序列的示例
//Create dialog config
GenieDialogConfig_Handle_t dialogConfigHandle = NULL;
GenieDialogConfig_createFromJson(dialogConfigStr, &dialogConfigHandle);
//Create dialog
GenieDialog_Handle_t dialogHandle = NULL;
GenieDialog_create(dialogConfigHandle, &dialogHandle);
//Query with original config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback);
//Update stop sequences using JSON string
const char newStopSequences[] = "{\"stop-sequence\":[\".\",\",\"]}";
GenieDialog_setStopSequence(dialogHandle, newStopSequences);
//Query with updated stop sequences
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback);
//Fallback to no stop sequence, using any of the below
const char noStopSequences[] = "{}"; //empty json
GenieDialog_setStopSequence(dialogHandle, noStopSequences);
const char noStopSequences2[] = "{\"stop-sequence\":[\"\"]}"; //empty string in json
GenieDialog_setStopSequence(dialogHandle, noStopSequences2);
GenieDialog_setStopSequence(dialogHandle, nullptr); //null pointer
GenieDialog_tokenQuery
有关字段和内容的详细信息,请参阅 Genie 对话框 JSON 配置字符串 他们的意思是。示例 model_config 可以在以下位置找到 ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-genaitransformer.json .
adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-genaitransformer.json> /data/local/tmp/
adb push <path to token file(.txt)> /data/local/tmp/
# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH
export ADSP_LIBRARY_PATH=$LD_LIBRARY_PATH
cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-genaitransformer-htp-kv-share.json>
-tok <path to token file(.txt)>
# Example tokenfile.txt
24948 592 1048 15146 2055
在 Windows 上使用令牌到令牌功能进行模型推理
在 Windows 上的 Snapdragon 主机上打开 Developer PowerShell for VS2022 并运行:
# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
.\genie-t2t-run.exe -c <path to cpu-model-config.json>
-tok <path to token file(.txt)>
# Example tokenfile.txt
24948 592 1048 15146 2055
采样器更新
注意
可以更新的参数请参考 ${SDK_ROOT}/examples/Genie/configs/sampler.json
Genie 提供了在一次 API 调用中更新单个参数和多个参数的灵活性
有关如何在查询之间更新采样器参数的示例
# Create dialog config
GenieDialogConfig_Handle_t dialogConfigHandle = NULL;
GenieDialogConfig_createFromJson(dialogConfigStr, &dialogConfigHandle);
# Create dialog
GenieDialog_Handle_t dialogHandle = NULL;
GenieDialog_create(dialogConfigHandle, &dialogHandle);
# Query with original config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
# Get dialog sampler handle
GenieDialogSampler_Handle_t samplerHandle = NULL;
GenieDialog_getSampler(dialogHandle, &samplerHandle);
# Create sampler config with a new sampler config
GenieSamplerConfig_Handle_t samplerConfigHandle = NULL;
GenieSamplerConfig_createFromJson(samplerConfigStr, &samplerConfigHandle);
# Apply the new sampler config
GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);
# Query with updated config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
# Update single parameter
GenieSamplerConfig_setParam(samplerConfigHandle, "top-p", "0.8");
GenieSamplerConfig_setParam(samplerConfigHandle, "top-k", "30");
# Apply the new sampler config
GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);
# Query with updated config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
# Update multiple parameters(top-k and top-p)
std::string valueStr = "\"sampler\" : {\n \"top-k\" : 20,\n \"top-p\" : 0.75\n } "
GenieSamplerConfig_setParam(samplerConfigHandle, "", valueStr.c_str());
# Apply the new sampler config
GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);
# Query with updated config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)
更多推荐



所有评论(0)