RollPacker、HistoSpec、SPEC-RL、AReal、StreamRL
来自RollPacker:Mitigating Long-Tail Rollouts for Fast,Synchronous RL Post-Training。
Rollpacker,动态拼batch的思路
来自RollPacker:Mitigating Long-Tail Rollouts for Fast,Synchronous RL Post-Training
By excluding long responses from short rounds and rescheduling them into a few designated long rounds, tail batching effectively reduces GPU idle time during rollouts and significantly accelerates RL training without sacrificing accuracy. 核心点就是过采样,重新拼Batch
HistoSpec和SPEC-RL都是投机解码加速
HistoSpec
来自https://arxiv.org/pdf/2508.18588
HistoSpec, a speculative decoding inference engine that utilizes the similarity of historical rollout token sequences to obtain accurate drafts. In each decoding
iteration, HistoSpec uses the last few generated tokens as the prefix to search for matches within the prompt’s historical responses. Upon matching, it extracts a certain number of subsequent tokens following the prefix as drafts.
SPEC-RL
来自https://arxiv.org/pdf/2509.23232。简单来说就是用t-1 epoch的rollout来给第t个epoch加速,如果这个epoch数据很多,是没有加速效果的
Fully Async的一些思路
https://verl.readthedocs.io/en/latest/advance/fully_async.html#introduction 有对Fully Async的一些介绍,以下是核心的一些点
AReal
来自AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning https://arxiv.org/abs/2505.24298
partial rollout思路最早来自https://arxiv.org/pdf/2501.12599, Areal也继承了这种思路,所谓的partial rollout实际就是当前θold\theta_{old}θold对一个序列rollout了一半,剩下一半由θ\thetaθ来完成,具体参考红框中的图是最清晰的。
StreamRL
来自https://arxiv.org/pdf/2504.15930,主要思路是两类问题:
- pipeline bubbles: StreamRL breaks the traditional stage boundary in synchronous RL algorithms through stream generation, and achieves fully overlappingin asynchronous RL。这一步实际上就是上图中©中利用了一部分staleness samples
- skewness bubbles: To address skewness bubbles, SGS utilizes an output length ranker to identify long-tail samples. Based on the predictions, it dispatches prompts to specific generation instances and decides scheduling order accordingly, effectively mitigating the bottleneck caused by long-tail samples. 实际上就是动态调度一部分长的prompts,给更多资源
更多推荐


所有评论(0)