深度模型，性能分析record_function+（torch.profiler，torch.utils.tensorboard，schedule）

是 PyTorch 中用于性能追踪和记录的工具，主要用于在代码中标记一个代码块，以便后续可以查看执行时间、内存使用情况、操作持续时间等信息。这个工具是 PyTorch 的和等性能分析工具的核心部分之一。

weixin_43881088

1707人浏览 · 2024-12-18 14:22:16

weixin_43881088 · 2024-12-18 14:22:16 发布

模型参数量和浮点运算次数

import torch
import torchvision.models as models
from thop import profile #pip install thop ,记录模型参数量和浮点运算次数

model = models.resnet18()
input = torch.randn(1, 3, 224, 224)
flops, params = profile(model, inputs=(input,))
print("FLOPs =", flops/1000000000, "params =", params)

#result
#FLOPs = 92。3504633856G params = 885867.0

`record_function+（``torch.profiler，torch.utils.tensorboard，schedule``）`

record_function 是 PyTorch 中用于性能追踪和记录的工具，主要用于在代码中标记一个代码块，以便后续可以查看执行时间、内存使用情况、操作持续时间等信息。这个工具是 PyTorch 的 TensorBoard 和 Profiler 等性能分析工具的核心部分之一。

1. `record_function` 的作用

record_function 用于创建一个上下文管理器，允许你记录在 with 语句块中执行的操作。通常用于记录性能、优化分析，或者调试某些操作的执行时长和资源消耗。

2. 语法

from torch.autograd import record_function

with record_function("YourCustomLabel"):
    # 这里的代码会被记录
    pass

"YourCustomLabel"：给这个操作一个自定义的标签名称，用于在日志或性能分析中标识它。
with 语句：用于指定 record_function 记录的代码块。在 with 块执行时，PyTorch 会开始跟踪该代码块的执行时间和相关的操作，直到 with 块执行完成。

3. 详细功能

record_function 提供了以下几种功能：

性能分析：
- record_function 可以自动记录代码块的执行时间，帮助你分析哪些部分的代码运行得比较慢，哪些操作消耗了最多的时间。
- 这些信息对于优化深度学习模型、改进代码性能非常重要。
集成到 Profiler 中：
- record_function 可以与 PyTorch 的 Profiler 一起使用，从而更加深入地分析模型的训练过程。
- 它会将相关操作的信息记录到 Profiler 中，以便生成更详细的性能报告。
TensorBoard 集成：
- record_function 可以帮助生成适合在 TensorBoard 上显示的性能分析图，方便在训练过程中进行实时监控。
日志记录：
- record_function 可以作为一种日志记录工具，在程序中插入标记，表示特定操作的开始和结束。这对于调试或者监控模型的不同部分非常有用。

4. 例子

例子 1：记录代码块的执行时间

import torch
from torch.autograd import record_function
import time

# 模拟一个简单的矩阵乘法操作
with record_function("Matrix Multiplication"):
    # 执行一个矩阵乘法操作，PyTorch 会记录这个操作的执行时间
    A = torch.randn(1000, 1000)
    B = torch.randn(1000, 1000)
    C = torch.matmul(A, B)

在这个例子中，record_function("Matrix Multiplication") 会记录矩阵乘法的执行时间。如果你使用 PyTorch Profiler，能看到这个操作的耗时信息。

例子 2：集成到 Profiler 中

import torch
from torch.autograd import record_function
import torch.profiler

def my_function():
    with record_function("Matrix Multiplication"):
        A = torch.randn(1000, 1000)
        B = torch.randn(1000, 1000)
        C = torch.matmul(A, B)

# 启动 Profiler
with torch.profiler.profile(schedule=torch.profiler.schedule(wait=1, warmup=1, active=2)) as prof:
    my_function()

# 打印 Profiler 输出
print(prof.key_averages().table(sort_by="cpu_time_total"))



输出示例：
----------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------ 
  Name                                    CPU time      CPU time %    Self CPU time  Self CPU %    CPU time total   CPU time total %  # of Calls 
----------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------ 
  aten::conv2d                            0.250 ms      30.0 %        0.250 ms      30.0 %        0.250 ms         30.0 %           2 
  aten::addmm                             0.200 ms      24.0 %        0.150 ms      18.0 %        0.200 ms         24.0 %           4 
  aten::relu                              0.100 ms      12.0 %        0.100 ms      12.0 %        0.100 ms         12.0 %           2 
  aten::dropout                           0.050 ms      6.0 %         0.050 ms      6.0 %         0.050 ms          6.0 %           2 
  aten::mul                               0.030 ms      3.0 %         0.030 ms      3.0 %         0.030 ms          3.0 %           4 
  aten::mean                              0.020 ms      2.4 %         0.020 ms      2.4 %         0.020 ms          2.4 %           1 
  aten::matmul                            0.020 ms      2.4 %         0.020 ms      2.4 %         0.020 ms          2.4 %           1 
  aten::softmax                           0.010 ms      1.2 %         0.010 ms      1.2 %         0.010 ms          1.2 %           1 
  ...                                      ...          ...           ...           ...          ...               ...              ...
----------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------ 


#输出字段解释
Name：操作的名称，例如 aten::conv2d 表示卷积操作。
CPU time：该操作总共花费的 CPU 时间（单位：毫秒）。
CPU time %：该操作在总 CPU 时间中占用的百分比。
Self CPU time：该操作本身（不包括其子操作）消耗的 CPU 时间。
Self CPU %：该操作本身的 CPU 时间占总 CPU 时间的百分比。
CPU time total：该操作及其所有子操作（如卷积内部的矩阵乘法等）消耗的总 CPU 时间。
CPU time total %：该操作及其子操作占用总 CPU 时间的百分比。
# of Calls：该操作被调用的次数。
解释示例
aten::conv2d 操作的 CPU 时间是 0.250 毫秒，占用了 30% 的 CPU 时间，这个操作被调用了 2 次。
aten::addmm 操作的 CPU 时间是 0.200 毫秒，占用了 24% 的 CPU 时间，被调用了 4 次。
aten::relu 操作的 CPU 时间是 0.100 毫秒，占用了 12% 的 CPU 时间，被调用了 2 次。
这些统计信息可以帮助你了解每个操作的性能瓶颈。例如，如果 aten::conv2d 占用了最多的 CPU 时间，可能表明卷积操作是瓶颈，你可能需要考虑使用更高效的实现或优化方法。

torch.profiler.profile 用于启动性能分析，record_function 被用来标记并记录代码块的执行时间。
Profiler 会输出每个操作的性能指标，包括 CPU 和 GPU 时间、内存使用量等。

例子 3：结合 TensorBoard 进行可视化

import torch
import torch.profiler

# 设置 TensorBoard 记录路径
with torch.profiler.profile(
        activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log')) as prof:
    with torch.autograd.record_function("Matrix Multiplication"):
        A = torch.randn(1000, 1000, device="cuda")
        B = torch.randn(1000, 1000, device="cuda")
        C = torch.matmul(A, B)