llama-factory大模型微调,有大佬帮我看看有什么问题吗?
数据集:glaive_toolcall_zh_demo.json,glaive_toolcall_en_demo.json。保存路径:/home/models/lora/function-calling。保存路径:/home/models/lora/identity。保存路径:/home/models/lora/mulimage。数据集目录:/home/LLaMA-Factory/data。保存路
llama-factory大模型微调,有大佬帮我看看有什么问题吗?
文章目录
- llama-factory大模型微调,有大佬帮我看看有什么问题吗?
-
- 一、SFT(监督微调)
-
- 1.1 微调function-calling能力
-
- 1.1.1 指令及指令解读
- 1.1.2 结果
- 1.2 identity
-
- 1.2.1 指令
- 1.2.2 结果
- 1.3 微调多模态(图像 + 文本)能力
- 二、DPO(直接偏好优化)
-
- 2.1 指令
- 2.2 结果
- 三、KTO(知识迁移优化)
-
- 3.1 指令
- 3.2 结果
- 四、动态合并LoRA的推理
-
- 4.1 合并function-calling
一、SFT(监督微调)
这里我参数都设置的比较小,缓解硬件资源压力。
1.1 微调function-calling能力
数据集:glaive_toolcall_zh_demo.json,glaive_toolcall_en_demo.json
数据集目录:/home/LLaMA-Factory/data
保存路径:/home/models/lora/function-calling
1.1.1 指令及指令解读
这里我拿了一个例子解读
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
--stage sft \
--do_train \
--do_eval \
--model_name_or_path /home/models/Qwen3-4B \
--dataset glaive_toolcall_zh_demo,glaive_toolcall_en_demo \
--dataset_dir /home/LLaMA-Factory/data \
--template qwen3 \
--finetuning_type lora \
--output_dir /home/models/lora/function-calling \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 512 \
--preprocessing_num_workers 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 50 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 50 \
--eval_strategy steps \
--save_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 5.0 \
--max_samples 1000 \
--val_size 0.1 \
--plot_loss \
--fp16
# 设置可见的GPU设备为0号卡
CUDA_VISIBLE_DEVICES=0 \
llamafactory-cli train \
--stage sft \ # 使用监督微调(Supervised Fine-Tuning)阶段
--do_train \ # 执行训练过程
--do_eval \ # 执行评估过程
--model_name_or_path /home/models/Qwen3-4B \ # 指定基础模型路径
--dataset glaive_toolcall_zh_demo,glaive_toolcall_en_demo \ # 指定训练使用的数据集(工具调用演示数据)
--dataset_dir /home/LLaMA-Factory/data \ # 数据集所在目录
--template qwen3 \ # 使用Qwen3模型的模板进行数据处理
--finetuning_type lora \ # 使用LoRA(低秩适应)方法进行参数高效微调
--output_dir /home/models/lora/function-calling \ # 微调模型的输出目录
--overwrite_cache \ # 覆盖已有的数据缓存
--overwrite_output_dir \ # 覆盖已存在的输出目录
--cutoff_len 512 \ # 最大序列长度,超过部分将被截断
--preprocessing_num_workers 1 \ # 数据预处理的工作线程数
--per_device_train_batch_size 1 \ # 每个GPU的训练批次大小
--per_device_eval_batch_size 1 \ # 每个GPU的评估批次大小
--gradient_accumulation_steps 4 \ # 梯度累积步数,等效批次大小=1×4=4
--lr_scheduler_type cosine \ # 学习率调度器类型为余弦退火
--logging_steps 50 \ # 每50步记录一次训练日志
--warmup_steps 20 \ # 学习率预热步数
--save_steps 100 \ # 每100步保存一次模型检查点
--eval_steps 50 \ # 每50步进行一次评估
--eval_strategy steps \ # 按步数进行评估
--save_strategy steps \ # 按步数保存模型
--load_best_model_at_end \ # 训练结束后加载最优模型
--learning_rate 5e-5 \ # 初始学习率
--num_train_epochs 5.0 \ # 训练的总轮次
--max_samples 1000 \ # 最大使用的样本数
--val_size 0.1 \ # 验证集占比
--plot_loss \ # 绘制训练和评估损失曲线
--fp16 # 使用半精度浮点数进行训练
1.1.2 结果
***** train metrics *****
epoch = 5.0
total_flos = 23183040GF
train_loss = 0.5242
train_runtime = 0:17:25.36
train_samples_per_second = 2.583
train_steps_per_second = 0.646
Figure saved at: /home/models/lora/function-calling/training_loss.png
Figure saved at: /home/models/lora/function-calling/training_eval_loss.png
[WARNING|2025-07-19 03:18:05] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4327] 2025-07-19 03:18:05,324 >>
***** Running Evaluation *****
[INFO|trainer.py:4329] 2025-07-19 03:18:05,324 >> Num examples = 60
[INFO|trainer.py:4332] 2025-07-19 03:18:05,324 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:05<00:00, 10.16it/s]
***** eval metrics *****
epoch = 5.0
eval_loss = 0.6062
eval_runtime = 0:00:06.03
eval_samples_per_second = 9.949
eval_steps_per_second = 9.949
[INFO|modelcard.py:450] 2025-07-19 03:18:11,352 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}



1.2 identity
1.2.1 指令
数据集:identity.json
数据集目录:/home/LLaMA-Factory/data
保存路径:/home/models/lora/identity
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
--stage sft \
--do_train \
--do_eval \
--model_name_or_path /home/models/Qwen3-4B \
--dataset identity \
--dataset_dir /home/LLaMA-Factory/data \
--template qwen3 \
--finetuning_type lora \
--output_dir /home/models/lora/identity \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 512 \
--preprocessing_num_workers 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 50 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 50 \
--eval_strategy steps \
--save_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 5.0 \
--max_samples 1000 \
--val_size 0.1 \
--plot_loss \
--fp16
1.2.2 结果
***** train metrics *****
epoch = 5.0
total_flos = 417712GF
train_loss = 1.6572
train_runtime = 0:02:23.99
train_samples_per_second = 2.813
train_steps_per_second = 0.729
Figure saved at: /home/models/lora/identity/training_loss.png
Figure saved at: /home/models/lora/identity/training_eval_loss.png
[WARNING|2025-07-19 07:16:18] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4327] 2025-07-19 07:16:18,620 >>
***** Running Evaluation *****
[INFO|trainer.py:4329] 2025-07-19 07:16:18,620 >> Num examples = 10
[INFO|trainer.py:4332] 2025-07-19 07:16:18,620 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 14.03it/s]
***** eval metrics *****
epoch = 5.0
eval_loss = 0.8526
eval_runtime = 0:00:00.80
eval_samples_per_second = 12.375
eval_steps_per_second = 12.375
[INFO|modelcard.py:450] 2025-07-19 07:16:19,425 >> Dropping the following result as it does not have all the necessary fields:
{‘task’: {‘name’: ‘Causal Language Modeling’, ‘type’: ‘text-generation’}}



1.3 微调多模态(图像 + 文本)能力
暂时跳过
修改,寻找数据集
数据集:mllm_demo
llava_150k_zh
数据集目录:/home/LLaMA-Factory/data
保存路径:/home/models/lora/mulimage
二、DPO(直接偏好优化)
2.1 指令
数据集:dpo_zh_demo.json,dpo_en_demo.json
数据集目录:/home/LLaMA-Factory/data
保存路径:/home/models/lora/dpo
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
--stage dpo \
--do_train \
--do_eval \
--model_name_or_path /home/models/Qwen3-4B \
--dataset dpo_zh_demo,dpo_en_demo \
--dataset_dir /home/LLaMA-Factory/data \
--template qwen3 \
--finetuning_type lora \
--output_dir /home/models/lora/dpo \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 512 \
--preprocessing_num_workers 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 50 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 50 \
--eval_strategy steps \
--save_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 5.0 \
--max_samples 1000 \
--val_size 0.1 \
--plot_loss \
--fp16
2.2 结果
***** train metrics *****
epoch = 5.0
total_flos = 44615278GF
train_loss = 0.146
train_runtime = 0:39:11.20
train_samples_per_second = 1.148
train_steps_per_second = 0.287
Figure saved at: /home/models/lora/dpo/training_loss.png
Figure saved at: /home/models/lora/dpo/training_rewards_accuracies.png
Figure saved at: /home/models/lora/dpo/training_eval_loss.png
[INFO|trainer.py:4327] 2025-07-19 04:37:21,811 >>
***** Running Evaluation *****
[INFO|trainer.py:4329] 2025-07-19 04:37:21,811 >> Num examples = 60
[INFO|trainer.py:4332] 2025-07-19 04:37:21,811 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:19<00:00, 3.05it/s]
***** eval metrics *****
epoch = 5.0
eval_logits/chosen = -1.2348
eval_logits/rejected = -1.1853
eval_logps/chosen = -483.7343
eval_logps/rejected = -510.3006
eval_loss = 0.823
eval_rewards/accuracies = 0.7167
eval_rewards/chosen = -1.2839
eval_rewards/margins = 3.5652
eval_rewards/rejected = -4.8492
eval_runtime = 0:00:20.11
eval_samples_per_second = 2.983
eval_steps_per_second = 2.983
[INFO|modelcard.py:450] 2025-07-19 04:37:41,926 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}




三、KTO(知识迁移优化)
3.1 指令
数据集:kto_en_demo.json
数据集目录:/home/LLaMA-Factory/data
保存路径:/home/models/lora/kto
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
--stage kto \
--do_train \
--do_eval \
--model_name_or_path /home/models/Qwen3-4B \
--dataset kto_en_demo \
--dataset_dir /home/LLaMA-Factory/data \
--template qwen3 \
--finetuning_type lora \
--output_dir /home/models/lora/kto \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 512 \
--preprocessing_num_workers 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 50 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 50 \
--eval_strategy steps \
--save_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 5.0 \
--max_samples 1000 \
--val_size 0.1 \
--plot_loss \
--fp16
3.2 结果
***** train metrics *****
epoch = 5.0
total_flos = 10358779GF
train_loss = 0.1777
train_runtime = 0:15:50.96
train_samples_per_second = 1.42
train_steps_per_second = 0.358
Figure saved at: /home/models/lora/kto/training_loss.png
Figure saved at: /home/models/lora/kto/training_rewards_chosen.png
Figure saved at: /home/models/lora/kto/training_eval_loss.png
[INFO|trainer.py:4327] 2025-07-19 07:44:11,598 >>
***** Running Evaluation *****
[INFO|trainer.py:4329] 2025-07-19 07:44:11,598 >> Num examples = 30
[INFO|trainer.py:4332] 2025-07-19 07:44:11,598 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00, 2.80it/s]
***** eval metrics *****
epoch = 5.0
eval_logits/chosen = -115518509.1765
eval_logits/rejected = -161532484.9231
eval_logps/chosen = -246.4654
eval_logps/rejected = -454.4223
eval_loss = 0.5156
eval_rewards/chosen = 0.3894
eval_rewards/margins = 0.6407
eval_rewards/rejected = -0.2513
eval_runtime = 0:00:11.16
eval_samples_per_second = 2.688
eval_steps_per_second = 2.688
kl = 199.3471
[INFO|modelcard.py:450] 2025-07-19 07:44:22,758 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}




四、动态合并LoRA的推理
4.1 合并function-calling
CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat \
--model_name_or_path /home/models/Qwen3-4B \
--adapter_name_or_path /home/models/lora/function-calling \
--template qwen3 \
--finetuning_type lora
更多推荐



所有评论(0)