zoukankan      html  css  js  c++  java
  • 如何估算模型训练T(FL)OPS efficiency

    Naive方法

    以Torch Vision ResNet50-v1.5为例。

    • Step 1: 获取模型的前向理论需求MACs(Multiply–ACcumulate)
      可使用thop得到模型的前向MACS。使用如下代码可得Torch Vision ResNet50-v1.5的前向MACs为4.112G。

      from torchvision.models import resnet50
      from thop import profile, clever_format
      import torch
      model = resnet50()
      input = torch.randn(1, 3, 224, 224)
      macs, params = profile(model, inputs=(input,))
      print(clever_format([macs, params], "%.3f"))
    • Step 2: 估算模型在某个实测性能下每秒需求的T(FL)OPS
      估算公式以OpenAI AI and Compute估算公式为基础:

      required_T(FL)OPS = (MACs per forward pass) * (2 (FL)OPs/MAC) * (3 for forward and backward pass) * (number of examples per second)

      再由实测性能数据:

      accelerator data type bs IPS
      V100 FP16 256 1325
      V100 FP32 128 303.1


      以V100 FP16训练为例,有:
      MACs per forward pass = 4.112G
      number of examples per second = 1325
      required_(FL)OPS = 4.112G * 2 * 3 * 1325 = 32.69 T
      汇总结果为:

      accelerator data type bs IPS required T(FL)OPS
      V100 FP16 256 1325 32.69
      V100 FP32 128 303.1 7.478
    • Step 3: 估算模型理论峰值算力利用率

      • 理论峰值算力

      • 理论峰值算力利用率

        required_T(FL)OPS / peak_T(FL)OPS

        accelerator data type bs IPS required TF(L)OPS peak ratio
        V100 FP16 256 1325 32.69 29.2%
        V100 FP32 128 303.1 7.478 53%

    References

    1. NV Training Performance Benchmark

    2. thop

    3. OpenAI AI and Compute

  • 相关阅读:
    Pthon3各平台的安装
    scrapy爬虫 简单入门
    自动定时打卡7.13
    centos7+python3+selenium+chrome
    在Ubuntu安装kubernetes
    在Ubuntu下安装Jenkins
    在Ubuntu安装Docker
    猫眼100 爬虫
    python 招聘数据分析
    mysql8.0.19忘记密码
  • 原文地址:https://www.cnblogs.com/Matrix_Yao/p/15747398.html
Copyright © 2011-2022 走看看