reflink: https://huggingface.co/docs/transformers/v4.53.3/en/accelerate

Accelerate

Accelerate is a library designed to simplify distributed training on any type of setup with PyTorch by uniting the most common frameworks (Fully Sharded Data Parallel (FSDP) and DeepSpeed) for it into a single interface. Trainer is powered by Accelerate under the hood, enabling loading big models and distributed training.

This guide will show you two ways to use Accelerate with Transformers, using FSDP as the backend. The first method demonstrates distributed training with Trainer, and the second method demonstrates adapting a PyTorch training loop. For more detailed information about Accelerate, please refer to the documentation.

Using Trainer 

Pass the path to the saved configuration file to TrainingArguments, and from there, pass your TrainingArguments to Trainer.

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="your-model",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    fsdp_config="path/to/fsdp_config",
    fsdp="full_shard",
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐