项目地址

MedQA(包括USMLE和MCMLE):https://github.com/liaoyanqing666/MedQA_LLM_Evaluation

PubMedQA:https://github.com/liaoyanqing666/PubMedQA_LLM_Evaluation

本项目写了一套脚本,可以简单直接评测任何大模型在最常用的benchmark数据集MedQA和PubMedQA上的结果。很多papers中常用的的USMLE和MCMLE数据集就是MedQA的子集。

注:如果文章被设置成了VIP可见,请告诉我改成公开可见,这是CSDN自己修改的,不是我设置的。

如何使用

本项目用起来非常简单,只需要在 bench_eval.py 或者 bench_eval_vllm.py 的最底部,修改模型名称即可。模型名称可以是模型的本地地址,也可以是 transformers/vllm 库支持的在线模型。项目中已经包含了原版数据集,因此无需额外下载数据集。

修改模型名称即可,其他参数看需要调整

其他几个可能涉及的参数:

  • data_path:一般不要调整,是评测的benchmark的地址
  • visible_gpus:可用的gpu,可以设置多卡
  • print_errors:是否打印每个非法评测样例的错误信息
  • record_file:为True时会在datapath相同目录下生成 `{原文件名}_eval_{模型名}.jsonl`,其中包含模型的完整生成内容及解析结果。
  • max_tokens:允许模型生成的最大token数量
  • batch_size:bench_eval.py 使用此参数,transformers库的批大小

数据集简要介绍

MedQA

MedQA 数据集来源于 MedQA 官方仓库。其中包含大陆医学考试题集(MCMLE)、美国医学考试题集(USMLE)以及台湾医学考试题集。所有题目均被处理为单选题;MCMLE 和 USMLE 数据集均提供 4 选项与 5 选项版本,台湾医学考试题集仅有 5 选项版本。MedQA数据集每个部分都包含训练集,测试集和验证集,不过作为benchmark时一般只使用测试集(test.jsonl)。除此之外,MedQA还包含了一些英文和中文的医学教材的txt文档,不过作为benchmark时用不到。

以下是MedQA-MCMLE的一个例子:

{"question": "糖原分子中一个葡萄糖单位经糖酵解途径分解成乳酸时能产生几分子ATP?(  )", "options": {"A": "2", "B": "1", "C": "3", "D": "4", "E": "5"}, "answer": "3", "meta_info": "生物化学", "answer_idx": "C"}

PubMedQA

PubMedQA 数据集来自 PubMedQA 官方仓库。该项目包含 PQA-L、PQA-U 和 PQA-A 三个子数据集,其中通常仅使用 PQA-L 作为benchmark评测数据。PQA-L 包含 1000 条论断及其对应的参考内容,模型需要判断每条论断的结论属于 "yes"、"no" 或 "maybe"。

以下是PubMedQA的一个例子,其中仅有"QESTION","CONTEXTS"和"final_decision"字段是做为benchmark需要的。

    "21645374": {
        "QUESTION": "Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?",
        "CONTEXTS": [
            "Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cells at the center of these areoles and progresses outwards, stopping approximately five cells from the vasculature. The role of mitochondria during PCD has been recognized in animals; however, it has been less studied during PCD in plants.",
            "The following paper elucidates the role of mitochondrial dynamics during developmentally regulated PCD in vivo in A. madagascariensis. A single areole within a window stage leaf (PCD is occurring) was divided into three areas based on the progression of PCD; cells that will not undergo PCD (NPCD), cells in early stages of PCD (EPCD), and cells in late stages of PCD (LPCD). Window stage leaves were stained with the mitochondrial dye MitoTracker Red CMXRos and examined. Mitochondrial dynamics were delineated into four categories (M1-M4) based on characteristics including distribution, motility, and membrane potential (\u0394\u03a8m). A TUNEL assay showed fragmented nDNA in a gradient over these mitochondrial stages. Chloroplasts and transvacuolar strands were also examined using live cell imaging. The possible importance of mitochondrial permeability transition pore (PTP) formation during PCD was indirectly examined via in vivo cyclosporine A (CsA) treatment. This treatment resulted in lace plant leaves with a significantly lower number of perforations compared to controls, and that displayed mitochondrial dynamics similar to that of non-PCD cells."
        ],
        "LABELS": [
            "BACKGROUND",
            "RESULTS"
        ],
        "MESHES": [
            "Alismataceae",
            "Apoptosis",
            "Cell Differentiation",
            "Mitochondria",
            "Plant Leaves"
        ],
        "YEAR": "2011",
        "reasoning_required_pred": "yes",
        "reasoning_free_pred": "yes",
        "final_decision": "yes",
        "LONG_ANSWER": "Results depicted mitochondrial dynamics in vivo as PCD progresses within the lace plant, and highlight the correlation of this organelle with other organelles during developmental PCD. To the best of our knowledge, this is the first report of mitochondria and chloroplasts moving on transvacuolar strands to form a ring structure surrounding the nucleus during developmental PCD. Also, for the first time, we have shown the feasibility for the use of CsA in a whole plant system. Overall, our findings implicate the mitochondria as playing a critical and early role in developmentally regulated PCD in the lace plant."
    },

项目文件说明

`bench_eval.py`:使用 `transformers` 加载模型进行评测的脚本。兼容性更强,但速度极慢。(参考速度:32B 全量模型,bf16,A800 双卡,一分钟余一条)

`bench_eval_vllm.py`:使用 `vllm` 加载模型进行评测的脚本,推荐优先使用此脚本。暂时仅此脚本支持 LoRA 微调模型的评测。兼容绝大部分常用模型及其基础上的微调模型(如 Baichuan-m2),详见 [vllm 官方支持文档](https://github.com/vllm-project/vllm/blob/main/docs/models/supported_models.md)。速度极快。(参考速度:32B 全量模型,bf16,A800 双卡,不到一秒一条)

`bench_eval_messages.py`:包含模型实际运行时的 message 格式,即 prompt 模板,也包含解析模型输出结果的函数。

`jsonl_eval.py`:当设置record_file=True,在模型评测完后,对记录的模型生成结果进行评测的脚本,可以重新计算模型正确率、有效正确率、最长有效文本长度、最短无效文本长度等指标。最长有效文本长度和最短无效文本长度可以帮助用户调试模型 `max_tokens` 参数。(但注意这两个长度是通过 string 长度计算的,并非 token 长度。)

`dataset/`:包含原始完全未处理的 MedQA 数据集的文件夹,无需额外下载。

`README.md`:本文件,项目说明文件。

`LICENSE`:本项目采用 Apache-2.0 许可证。

`.gitignore`:git 忽略文件。

希望这个代码对你有帮助!如果你遇到任何问题,欢迎留言或通过 GitHub/邮件提交问题。如果你觉得这个项目有趣,也别忘了给它一个 Star ⭐


 作者的其他有趣文章:

保姆级教程:完全从零搭建简单个人网站(免费,无需服务器无需域名)个人博客网站,个人简历网站

Windows时间悬浮窗程序,开源Windows程序(功能介绍)

适合初学者的Transformer介绍(通俗易懂),含pytorch代码_transformer入门介绍

一个特别的情人节/生日/纪念日礼物:恋爱照片网站 (恋爱照片墙)

重复的囚徒困境博弈中应该如何决策?--阿克塞尔罗德竞赛(Axelrod’s Tournament)实验

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐