zoukankan      html  css  js  c++  java
  • 滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试

    今天拿到了滴滴云内测版A100,跑了一下 TensorFlow基准测试,现在把结果记录一下!

    运行环境

    平台为:滴滴云

    系统为:Ubuntu 18.04

    显卡为:A100-SXM4-40GB

    Python版本: 3.6

    TensorFlow版本:1.15.2 NV编译版

    系统环境:

    测试方法

    TensorFlow benchmarks测试方法:

    https://github.com/tensorflow/benchmarks

    resnet50_v1.5

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5
    Step    Img/sec total_loss
    1       images/sec: 602.4 +/- 0.0 (jitter = 0.0)        7.847
    10      images/sec: 606.8 +/- 1.2 (jitter = 5.4)        8.053
    20      images/sec: 606.3 +/- 0.8 (jitter = 4.4)        8.102
    30      images/sec: 605.8 +/- 0.8 (jitter = 3.8)        8.117
    40      images/sec: 606.2 +/- 0.7 (jitter = 3.8)        7.893
    50      images/sec: 606.1 +/- 0.5 (jitter = 3.0)        7.919
    60      images/sec: 606.2 +/- 0.5 (jitter = 2.9)        8.104
    70      images/sec: 606.6 +/- 0.5 (jitter = 2.9)        7.985
    80      images/sec: 606.6 +/- 0.4 (jitter = 2.8)        7.805
    90      images/sec: 606.6 +/- 0.4 (jitter = 2.8)        7.973
    100     images/sec: 606.7 +/- 0.4 (jitter = 2.8)        7.644
    ----------------------------------------------------------------
    total images/sec: 606.23
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5 --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 1327.1 +/- 0.0 (jitter = 0.0) 7.972
    10 images/sec: 1321.2 +/- 5.7 (jitter = 27.6) 7.885
    20 images/sec: 1323.5 +/- 4.4 (jitter = 25.9) 8.073
    30 images/sec: 1323.6 +/- 3.7 (jitter = 27.3) 7.934
    40 images/sec: 1322.1 +/- 3.3 (jitter = 32.9) 8.102
    50 images/sec: 1321.4 +/- 3.0 (jitter = 27.7) 7.876
    60 images/sec: 1322.2 +/- 2.8 (jitter = 32.3) 7.883
    70 images/sec: 1322.3 +/- 2.5 (jitter = 32.6) 7.962
    80 images/sec: 1324.0 +/- 2.4 (jitter = 32.2) 8.049
    90 images/sec: 1324.2 +/- 2.2 (jitter = 31.2) 7.909
    100 images/sec: 1325.1 +/- 2.1 (jitter = 29.6) 7.874
    ----------------------------------------------------------------
    total images/sec: 1322.76
    ----------------------------------------------------------------

    Resnet50 BS64

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50
    Step    Img/sec total_loss
    1 images/sec: 653.5 +/- 0.0 (jitter = 0.0) 8.219
    10 images/sec: 646.2 +/- 2.0 (jitter = 6.0) 7.879
    20 images/sec: 646.1 +/- 1.4 (jitter = 7.2) 7.909
    30 images/sec: 646.0 +/- 1.2 (jitter = 6.0) 7.820
    40 images/sec: 646.2 +/- 1.0 (jitter = 6.3) 8.006
    50 images/sec: 646.0 +/- 1.0 (jitter = 8.6) 7.769
    60 images/sec: 646.0 +/- 0.9 (jitter = 8.6) 8.114
    70 images/sec: 645.7 +/- 0.9 (jitter = 9.5) 7.811
    80 images/sec: 645.8 +/- 0.8 (jitter = 9.5) 7.979
    90 images/sec: 645.8 +/- 0.8 (jitter = 8.0) 8.095
    100 images/sec: 645.8 +/- 0.7 (jitter = 6.4) 8.038
    ----------------------------------------------------------------
    total images/sec: 645.26
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 1300.1 +/- 0.0 (jitter = 0.0) 8.101
    10 images/sec: 1310.1 +/- 7.5 (jitter = 7.4) 7.758
    20 images/sec: 1309.7 +/- 8.0 (jitter = 42.3) 7.912
    30 images/sec: 1315.0 +/- 5.9 (jitter = 32.1) 7.776
    40 images/sec: 1315.5 +/- 4.7 (jitter = 28.2) 7.918
    50 images/sec: 1317.5 +/- 3.9 (jitter = 27.7) 7.895
    60 images/sec: 1316.5 +/- 3.4 (jitter = 18.6) 7.711
    70 images/sec: 1317.3 +/- 3.1 (jitter = 16.1) 8.008
    80 images/sec: 1316.9 +/- 2.8 (jitter = 11.4) 7.777
    90 images/sec: 1317.7 +/- 2.6 (jitter = 11.8) 7.808
    100 images/sec: 1317.1 +/- 2.4 (jitter = 9.9) 8.036
    ----------------------------------------------------------------
    total images/sec: 1315.11
    ----------------------------------------------------------------

    AlexNet BS512

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet
    Step    Img/sec total_loss
    1 images/sec: 8294.2 +/- 0.0 (jitter = 0.0) nan
    10 images/sec: 8290.2 +/- 1.6 (jitter = 5.3) nan
    20 images/sec: 8290.6 +/- 1.0 (jitter = 3.7) nan
    30 images/sec: 8290.8 +/- 0.7 (jitter = 2.8) nan
    40 images/sec: 8291.3 +/- 0.6 (jitter = 2.7) nan
    50 images/sec: 8289.8 +/- 1.4 (jitter = 2.9) nan
    60 images/sec: 8290.2 +/- 1.2 (jitter = 2.9) nan
    70 images/sec: 8290.4 +/- 1.3 (jitter = 3.6) nan
    80 images/sec: 8291.1 +/- 1.1 (jitter = 3.5) nan
    90 images/sec: 8291.9 +/- 1.0 (jitter = 4.4) nan
    100 images/sec: 8291.9 +/- 1.1 (jitter = 5.2) nan
    ----------------------------------------------------------------
    total images/sec: 8282.46
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 10618.6 +/- 0.0 (jitter = 0.0) 7.250
    10 images/sec: 10607.7 +/- 4.4 (jitter = 16.3) 7.251
    20 images/sec: 10602.5 +/- 3.0 (jitter = 13.1) 7.251
    30 images/sec: 10604.1 +/- 2.3 (jitter = 11.2) 7.251
    40 images/sec: 10601.0 +/- 2.5 (jitter = 13.4) 7.251
    50 images/sec: 10601.7 +/- 2.5 (jitter = 13.8) 7.251
    60 images/sec: 10603.0 +/- 2.2 (jitter = 14.0) 7.250
    70 images/sec: 10605.1 +/- 2.1 (jitter = 12.5) 7.251
    80 images/sec: 10605.4 +/- 1.9 (jitter = 12.2) 7.251
    90 images/sec: 10605.4 +/- 1.7 (jitter = 12.1) 7.251
    100 images/sec: 10605.8 +/- 1.7 (jitter = 12.3) 7.251
    ----------------------------------------------------------------
    total images/sec: 10587.67
    ----------------------------------------------------------------

    Inception v3 BS64

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3
    Step    Img/sec total_loss
    1 images/sec: 436.8 +/- 0.0 (jitter = 0.0) 7.276
    10 images/sec: 437.9 +/- 1.2 (jitter = 0.8) 7.337
    20 images/sec: 437.8 +/- 1.0 (jitter = 2.2) 7.269
    30 images/sec: 437.9 +/- 0.8 (jitter = 2.2) 7.422
    40 images/sec: 437.9 +/- 0.6 (jitter = 3.5) 7.299
    50 images/sec: 438.6 +/- 0.6 (jitter = 4.1) 7.277
    60 images/sec: 439.2 +/- 0.5 (jitter = 3.7) 7.363
    70 images/sec: 439.5 +/- 0.5 (jitter = 4.8) 7.347
    80 images/sec: 440.3 +/- 0.5 (jitter = 5.3) 7.410
    90 images/sec: 440.3 +/- 0.5 (jitter = 5.2) 7.325
    100 images/sec: 440.3 +/- 0.4 (jitter = 5.0) 7.346
    ----------------------------------------------------------------
    total images/sec: 440.01
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 901.5 +/- 0.0 (jitter = 0.0) 7.305
    10 images/sec: 945.5 +/- 7.0 (jitter = 5.0) 7.354
    20 images/sec: 945.6 +/- 4.9 (jitter = 7.1) 7.330
    30 images/sec: 945.3 +/- 3.9 (jitter = 6.9) 7.382
    40 images/sec: 946.3 +/- 3.2 (jitter = 7.3) 7.278
    50 images/sec: 946.6 +/- 2.8 (jitter = 7.5) 7.373
    60 images/sec: 946.3 +/- 2.5 (jitter = 7.6) 7.299
    70 images/sec: 946.8 +/- 2.3 (jitter = 7.5) 7.323
    80 images/sec: 946.5 +/- 2.1 (jitter = 7.6) 7.317
    90 images/sec: 946.6 +/- 2.0 (jitter = 7.6) 7.357
    100 images/sec: 947.2 +/- 1.8 (jitter = 7.3) 7.327
    ----------------------------------------------------------------
    total images/sec: 946.03
    ----------------------------------------------------------------

    VGG16 BS64

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16
    Step    Img/sec total_loss
    1 images/sec: 442.1 +/- 0.0 (jitter = 0.0) 7.321
    10 images/sec: 442.4 +/- 0.1 (jitter = 0.4) 7.315
    20 images/sec: 442.4 +/- 0.1 (jitter = 0.3) 7.269
    30 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.271
    40 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.282
    50 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.291
    60 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.250
    70 images/sec: 442.4 +/- 0.1 (jitter = 0.2) 7.278
    80 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.274
    90 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.286
    100 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.283
    ----------------------------------------------------------------
    total images/sec: 442.20
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 687.4 +/- 0.0 (jitter = 0.0) 7.279
    10 images/sec: 688.2 +/- 0.2 (jitter = 0.5) 7.255
    20 images/sec: 688.0 +/- 0.1 (jitter = 0.5) 7.283
    30 images/sec: 688.0 +/- 0.1 (jitter = 0.7) 7.254
    40 images/sec: 687.9 +/- 0.1 (jitter = 0.7) 7.283
    50 images/sec: 687.8 +/- 0.1 (jitter = 0.7) 7.249
    60 images/sec: 687.7 +/- 0.1 (jitter = 0.8) 7.294
    70 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.278
    80 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268
    90 images/sec: 687.7 +/- 0.1 (jitter = 0.9) 7.264
    100 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268
    ----------------------------------------------------------------
    total images/sec: 687.07
    ----------------------------------------------------------------

    GoogLeNet BS128

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet
    Step    Img/sec total_loss
    1 images/sec: 1577.4 +/- 0.0 (jitter = 0.0) 7.104
    10 images/sec: 1565.9 +/- 4.1 (jitter = 12.5) 7.105
    20 images/sec: 1561.7 +/- 3.1 (jitter = 20.4) 7.094
    30 images/sec: 1562.3 +/- 2.5 (jitter = 15.1) 7.087
    40 images/sec: 1561.5 +/- 2.2 (jitter = 16.1) 7.067
    50 images/sec: 1561.6 +/- 2.0 (jitter = 15.6) 7.091
    60 images/sec: 1561.5 +/- 1.8 (jitter = 15.7) 7.049
    70 images/sec: 1560.3 +/- 1.9 (jitter = 15.3) 7.074
    80 images/sec: 1558.8 +/- 1.9 (jitter = 17.2) 7.077
    90 images/sec: 1558.2 +/- 1.8 (jitter = 17.2) 7.079
    100 images/sec: 1557.5 +/- 1.8 (jitter = 17.6) 7.066
    ----------------------------------------------------------------
    total images/sec: 1556.06
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 2690.1 +/- 0.0 (jitter = 0.0) 7.173
    10 images/sec: 2675.3 +/- 13.9 (jitter = 35.5) 7.068
    20 images/sec: 2682.4 +/- 9.9 (jitter = 55.4) 7.086
    30 images/sec: 2686.6 +/- 8.3 (jitter = 36.6) 7.075
    40 images/sec: 2687.8 +/- 6.9 (jitter = 30.6) 7.084
    50 images/sec: 2686.7 +/- 6.0 (jitter = 36.4) 7.076
    60 images/sec: 2687.5 +/- 5.4 (jitter = 36.4) 7.075
    70 images/sec: 2681.0 +/- 6.8 (jitter = 41.6) 7.075
    80 images/sec: 2683.2 +/- 6.1 (jitter = 34.0) 7.065
    90 images/sec: 2684.1 +/- 5.6 (jitter = 35.6) 7.092
    100 images/sec: 2683.9 +/- 5.2 (jitter = 36.1) 7.052
    ----------------------------------------------------------------
    total images/sec: 2680.27
    ----------------------------------------------------------------

    ResNet152 BS32

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152
    Step    Img/sec total_loss
    1 images/sec: 225.6 +/- 0.0 (jitter = 0.0) 9.060
    10 images/sec: 228.3 +/- 1.0 (jitter = 2.0) 8.594
    20 images/sec: 228.3 +/- 0.6 (jitter = 2.0) 8.635
    30 images/sec: 228.2 +/- 0.5 (jitter = 2.5) 8.719
    40 images/sec: 227.9 +/- 0.5 (jitter = 2.8) 8.599
    50 images/sec: 228.1 +/- 0.5 (jitter = 2.9) 8.791
    60 images/sec: 228.3 +/- 0.4 (jitter = 3.6) 8.668
    70 images/sec: 228.3 +/- 0.4 (jitter = 3.3) 9.072
    80 images/sec: 228.3 +/- 0.4 (jitter = 3.5) 8.874
    90 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 9.030
    100 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 8.839
    ----------------------------------------------------------------
    total images/sec: 228.29
    ----------------------------------------------------------------

    --use_fp16

    python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152 --use_fp16
    Step    Img/sec total_loss
    1 images/sec: 392.9 +/- 0.0 (jitter = 0.0) 9.147
    10 images/sec: 397.9 +/- 2.8 (jitter = 6.0) 9.000
    20 images/sec: 399.0 +/- 2.1 (jitter = 8.6) 8.842
    30 images/sec: 393.7 +/- 2.9 (jitter = 14.7) 8.813
    40 images/sec: 394.4 +/- 2.3 (jitter = 15.2) 8.984
    50 images/sec: 394.9 +/- 2.0 (jitter = 13.9) 8.647
    60 images/sec: 395.7 +/- 1.8 (jitter = 13.9) 8.838
    70 images/sec: 396.5 +/- 1.6 (jitter = 15.3) 8.941
    80 images/sec: 395.9 +/- 1.4 (jitter = 13.4) 8.913
    90 images/sec: 396.2 +/- 1.3 (jitter = 14.1) 8.807
    100 images/sec: 395.7 +/- 1.3 (jitter = 14.5) 8.729
    ----------------------------------------------------------------
    total images/sec: 395.34
    ----------------------------------------------------------------

    性能对比

    A100 和V100 和 2080ti 性能对比:

    https://www.tonyisstark.com/383.html

  • 相关阅读:
    IntelliJ IDEA 偏好设置
    Unix环境下的5中IO模型
    Hbase原理、基本概念、基本架构
    可参考的技术博客
    Hadoop生态系统介绍
    oracle 添加表分区和索引分区,修改索引分区默认表空间
    koa redis 链接
    Node-APN 开源推送服务
    NodeJs 笔记
    windows防火墙无法启动,服务不存在
  • 原文地址:https://www.cnblogs.com/wangpg/p/13689536.html
Copyright © 2011-2022 走看看