顶会论文 AutoDAN 复现 2024 ICLR AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large
备注:requirements.txt中删掉torch==2.0.1,因为已经安装好了。Llama-2-7b-chat-hf 模型下载。
总目录 大模型相关研究:https://blog.csdn.net/WhiffeYF/article/details/142132328
0 前言
总目录 大模型安全相关研究:https://blog.csdn.net/WhiffeYF/article/details/142132328
论文阅读: 2024 ICLR AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
https://github.com/SheltonLiu-N/AutoDAN/tree/main
b站视频:https://www.bilibili.com/video/BV1nMGdzBEHT/
平台采用Autodl:https://www.autodl.com/home
PyTorch / 2.3.0 / 3.12(ubuntu22.04) / 12.1
1 模型下载
Llama-2-7b-chat-hf 模型下载
使用魔塔社区下载:https://www.modelscope.cn/models/shakechen/Llama-2-7b-chat-hf/files
使用SDK下载下载:
开始前安装
source /etc/network_turbo
pip install modelscope
脚本下载
# source /etc/network_turbo
from modelscope import snapshot_download
# 指定模型的下载路径
cache_dir = '/root/autodl-tmp'
# 调用 snapshot_download 函数下载模型
model_dir = snapshot_download('shakechen/Llama-2-7b-chat-hf', cache_dir=cache_dir)
# model_dir = snapshot_download('Qwen/Qwen3-0.6B', cache_dir=cache_dir)
print(f"模型已下载到: {model_dir}")
2 AutoDAN 安装
source /etc/network_turbo
git clone https://github.com/SheltonLiu-N/AutoDAN.git
cd AutoDAN
首先是requirements.txt需要处理
删掉torch一行,不加版本限制
fschat
nltk
numpy
openai
tqdm
transformers
sentencepiece
protobuf
accelerate
如果要跑qwen3
需要修改transformers版本:
fschat
nltk
numpy
openai
tqdm
transformers==4.51.3
sentencepiece
protobuf
accelerate
source /etc/network_turbo
pip install -r requirements.txt
source /etc/network_turbo
pip install pandas
开始前,还需要在AutoDAN/autodan_ga_eval.py修改模型路径:
/root/autodl-tmp/shakechen/Llama-2-7b-chat-hf
3 运行
source /etc/network_turbo
python autodan_ga_eval.py
项目代码解释放在了:https://github.com/Whiffe/AutoDAN-explain
4 AutoDAN 跑Qwen3
命令:
python autodan_ga_eval.py \
--model qwen3
修改内容:
AutoDAN/requirements.txt:
fschat==0.2.20
nltk==3.8.1
numpy==1.26.0
openai==0.28.1
tqdm==4.66.1
# transformers==4.28.0
transformers==4.51.3
sentencepiece==0.1.99
protobuf==4.24.4
accelerate==0.23.0
autodan_ga_eval.py
# 增加了"qwen3": "Alibaba Cloud"
developer_dict = {"llama2": "Meta", "vicuna": "LMSYS",
"guanaco": "TheBlokeAI", "WizardLM": "WizardLM",
"mpt-chat": "MosaicML", "mpt-instruct": "MosaicML", "falcon": "TII",
"qwen3": "Alibaba Cloud"
}
# 增加了 "qwen3": "/root/autodl-tmp/Qwen/Qwen3-0.6B"
model_path_dicts = {"llama2": "./models/llama2/llama-2-7b-chat-hf", "vicuna": "./models/vicuna/vicuna-7b-v1.3",
"guanaco": "./models/guanaco/guanaco-7B-HF", "WizardLM": "./models/WizardLM/WizardLM-7B-V1.0",
"mpt-chat": "./models/mpt/mpt-7b-chat", "mpt-instruct": "./models/mpt/mpt-7b-instruct",
"falcon": "./models/falcon/falcon-7b-instruct", "qwen3": "/root/autodl-tmp/Qwen/Qwen3-0.6B"}
更多推荐
所有评论(0)