Accelerater for distributed training

weifengma-wish

297人浏览 · 2025-08-13 14:24:05

weifengma-wish · 2025-08-13 14:24:05 发布

reflink: https://huggingface.co/docs/transformers/v4.53.3/en/accelerate

Accelerate

Accelerate is a library designed to simplify distributed training on any type of setup with PyTorch by uniting the most common frameworks (Fully Sharded Data Parallel (FSDP) and DeepSpeed) for it into a single interface. Trainer is powered by Accelerate under the hood, enabling loading big models and distributed training.

This guide will show you two ways to use Accelerate with Transformers, using FSDP as the backend. The first method demonstrates distributed training with Trainer, and the second method demonstrates adapting a PyTorch training loop. For more detailed information about Accelerate, please refer to the documentation.

Using Trainer

Pass the path to the saved configuration file to TrainingArguments, and from there, pass your TrainingArguments to Trainer.

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="your-model",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    fsdp_config="path/to/fsdp_config",
    fsdp="full_shard",
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

欧美市场呼叫中心选型：如何评估GDPR合规性与AI技术型服务能力

核心要点欧美呼叫中心选型需建立“合规底线-本地化深度-技术效能”评估框架，超越单一价格维度，重点核查ISO27701等资质及GDPR落地颗粒度。头部厂商能力边界分化明显：东软云科技侧重AI技术型售后与全链路合规，Teleperformance依托超大规模网络提供标准化交付，Concentrix聚焦数据驱动的体验转型。企业应依据业务场景匹配服务商，如严苛合规与技术售后选东软，全球统一标准选TP，数字

2048 AI社区

非技术创业者如何从一个想法快速生成Web原型？

2026年，非技术创业者已经不需要技术合伙人就能验证产品想法。从一个模糊的想法到可以与用户互动的完整Web原型，现在只需要2-5天和几百块钱——而不是3-6个月和数十万元。从"等待开发"到"快速验证"。你不再需要依赖技术人才，不需要投入巨额成本，就能在最短时间内知道你的想法是否真的有市场。完整的验证流程很简单理清想法→ 写一段200字的产品描述（1小时）生成原型→ 用AI工具生成完整可交互原型（1

2048 AI社区

我的 Claude Code 效率工具全套配置分享

claude-mem 在后台运行一个本地 Worker 服务（默认端口 37777），通过 5 个生命周期钩子（SessionStart、UserPromptSubmit、PostToolUse、Summary、SessionEnd）这个插件的灵感来自 Manus 的工作方式。使用快速迭代的框架（Next.js、React、Tailwind 等），或者任何需要查阅 API 文档的开发工作。特别有用