KV$ Rewind

KV$ 倒带/KV$ 前缀匹配功能允许利用以前缓存的 KV 值来进行高效的查询处理。使用 KV Rewind 时,Genie 可以重用先前查询中的 KV 缓存值,以加快新的类似查询的处理速度。这在新查询与前一个查询共享公共前缀的情况下特别有用。

在查询之间使用 KV$ Rewind

typedef enum {
  /// The string is the entire query/response.
  GENIE_DIALOG_SENTENCE_COMPLETE = 0,
  /// The string is the beginning of the query/response.
  GENIE_DIALOG_SENTENCE_BEGIN = 1,
  /// The string is a part of the query/response and not the beginning or end.
  GENIE_DIALOG_SENTENCE_CONTINUE = 2,
  /// The string is the end of the query/response.
  GENIE_DIALOG_SENTENCE_END = 3,
  /// The query has been aborted.
  GENIE_DIALOG_SENTENCE_ABORT = 4,
  ///Rewind the KV cache as per prefix query match before processing the query
  GENIE_DIALOG_SENTENCE_REWIND = 5,
} GenieDialog_SentenceCode_t;

GENIE_API
Genie_Status_t GenieDialog_query(const GenieDialog_Handle_t dialogHandle,
                                 const char* queryStr,
                                 const GenieDialog_SentenceCode_t sentenceCode,
                                 const GenieDialog_QueryCallback_t callback,
                                 const void* userData);

使用句子代码 GENIE_DIALOG_SENTENCE_REWIND 并像普通查询一样传递查询字符串。API 将在内部处理前缀匹配和KV$ Rewind

注意
KV$ 前缀匹配与 KV 更新方法 SMART_MASK 配合得很好。然而,使用 KV 更新方法 POINTER_SHIFT,我们观察到在少数情况下,它会为权重共享 bin 抛出与内存寄存器相关的错误。POINTER_SHIFT 工作正常或对仅解码器型号(AR1 / AR8 / AR128 等)没有问题。

在 genie-t2t-run 中,我们可以使用 ‘-w’ 选项进行倒带查询。

例如:

./genie-t2t-run -c llama2-7b-htp.json
                -p "Answer in one sentence, what is the capital city of India?"
                -w "Answer in one sentence, what is the capital city of Russia?"

GenieDialog_setStopSequence API

用户最初可以在对话框配置中设置停止序列列表。除此之外,Genie 还提供了在运行时动态更新停止序列的功能。

实现此目的的 API:

/**
* @brief A function to set/update the list of stop sequences of a dialog. Old sequences if any are discarded.
*        Call with a nullptr or an empty JSON or an empty string to reset to no stop sequence.
*
* @param[in] dialogHandle A dialog handle.
*
* @param[in] newStopSequences A JSON string with list of new stop sequences. Must be null terminated.
*
* @return Status code:
*         - GENIE_STATUS_SUCCESS: API call was successful.
*         - GENIE_STATUS_ERROR_INVALID_HANDLE: Dialog handle is invalid.
*         - GENIE_STATUS_ERROR_JSON_FORMAT: JSON string is invalid.
*/
GENIE_API
Genie_Status_t GenieDialog_setStopSequence(const GenieDialog_Handle_t dialogHandle,
                                           const char* newStopSequences);

有关如何在查询之间更新停止序列的示例

//Create dialog config
GenieDialogConfig_Handle_t dialogConfigHandle = NULL;
GenieDialogConfig_createFromJson(dialogConfigStr, &dialogConfigHandle);

//Create dialog
GenieDialog_Handle_t dialogHandle = NULL;
GenieDialog_create(dialogConfigHandle, &dialogHandle);

//Query with original config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback);

//Update stop sequences using JSON string
const char newStopSequences[] = "{\"stop-sequence\":[\".\",\",\"]}";
GenieDialog_setStopSequence(dialogHandle, newStopSequences);

//Query with updated stop sequences
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback);

//Fallback to no stop sequence, using any of the below
const char noStopSequences[] = "{}"; //empty json
GenieDialog_setStopSequence(dialogHandle, noStopSequences);

const char noStopSequences2[] = "{\"stop-sequence\":[\"\"]}"; //empty string in json
GenieDialog_setStopSequence(dialogHandle, noStopSequences2);

GenieDialog_setStopSequence(dialogHandle, nullptr); //null pointer

GenieDialog_tokenQuery

有关字段和内容的详细信息,请参阅 Genie 对话框 JSON 配置字符串 他们的意思是。示例 model_config 可以在以下位置找到 ${QNN_SDK_ROOT}/examples/Genie/configs/llama2-7b-genaitransformer.json .

adb shell mkdir -p /data/local/tmp/
adb push <path to llama2-7b-genaitransformer.json> /data/local/tmp/
adb push <path to token file(.txt)> /data/local/tmp/

# open adb shell
adb shell
export LD_LIBRARY_PATH=/data/local/tmp/
export PATH=$LD_LIBRARY_PATH:$PATH
export ADSP_LIBRARY_PATH=$LD_LIBRARY_PATH

cd $LD_LIBRARY_PATH
./genie-t2t-run -c <path to llama2-7b-genaitransformer-htp-kv-share.json>
                -tok <path to token file(.txt)>

# Example tokenfile.txt
24948 592 1048 15146 2055

在 Windows 上使用令牌到令牌功能进行模型推理

在 Windows 上的 Snapdragon 主机上打开 Developer PowerShell for VS2022 并运行:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
.\genie-t2t-run.exe -c <path to cpu-model-config.json>
                    -tok <path to token file(.txt)>

# Example tokenfile.txt
24948 592 1048 15146 2055

采样器更新

注意
可以更新的参数请参考 ${SDK_ROOT}/examples/Genie/configs/sampler.json

Genie 提供了在一次 API 调用中更新单个参数和多个参数的灵活性

有关如何在查询之间更新采样器参数的示例

# Create dialog config
GenieDialogConfig_Handle_t dialogConfigHandle = NULL;
GenieDialogConfig_createFromJson(dialogConfigStr, &dialogConfigHandle);

# Create dialog
GenieDialog_Handle_t dialogHandle = NULL;
GenieDialog_create(dialogConfigHandle, &dialogHandle);

# Query with original config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)

# Get dialog sampler handle
GenieDialogSampler_Handle_t samplerHandle = NULL;
GenieDialog_getSampler(dialogHandle, &samplerHandle);

# Create sampler config with a new sampler config
GenieSamplerConfig_Handle_t samplerConfigHandle = NULL;
GenieSamplerConfig_createFromJson(samplerConfigStr, &samplerConfigHandle);

# Apply the new sampler config
GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);

# Query with updated config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)

# Update single parameter
GenieSamplerConfig_setParam(samplerConfigHandle, "top-p", "0.8");
GenieSamplerConfig_setParam(samplerConfigHandle, "top-k", "30");

# Apply the new sampler config
GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);

# Query with updated config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)

# Update multiple parameters(top-k and top-p)
std::string valueStr = "\"sampler\" : {\n      \"top-k\" : 20,\n      \"top-p\" : 0.75\n } "
GenieSamplerConfig_setParam(samplerConfigHandle, "", valueStr.c_str());

# Apply the new sampler config
GenieDialogSampler_applyConfig(samplerHandle, samplerConfigHandle);

# Query with updated config
GenieDialog_query(dialogHandle, promptStr, GenieDialog_SentenceCode_t::GENIE_DIALOG_SENTENCE_COMPLETE, queryCallback)

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐