zoukankan      html  css  js  c++  java
  • 【BlockSwap】2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget-论文阅读

    BlockSwap

    2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

    来源: ChenBong 博客园


    Introduction

    image-20201013184149297

    • backbone network (consist of standard block)
      1. cheap blocks pool ==> (swap backbone's block) candidate blcok-swap networks space
      1. under constraint sample ==> compute fisher score ==> ranking network
      1. Distillation (T: backbone network, S: block-swap network)

    Contribution

    • block-wise 的 替换,相对于 NAS(bottom-up method)来说降低了搜索空间,速度更快,相对于 改变网络的深度/宽度方法,如剪枝等(top-dowm method)来说搜索维度更高(剪枝只能修改每一层的filter个数,block swap 还可以修改层的类型)
    • 基于 Fisher information 的候选网络快速评估算法

    Method

    Fisher Information

    • 泰勒展开来计算filter重要性的方法 与 计算 Fisher information 的方法等价
    • (Delta_{c}=frac{1}{2 N} sum_{n}^{N}left(sum_{i}^{W} sum_{j}^{H} a_{n i j} g_{n i j} ight)^{2})
      • feature map大小 W×H, (sum a_{ij}*g_{ij}) 衡量一个(filter输出的)channel 的重要性 (Delta_{c})
    • (Delta_{b}=sum_{c}^{C} Delta_{c})
      • C是一个 blcok 的总通道数;一个block的重要性表示为 (Delta_{b})
    • (sum_B Delta_{b})
      • B 是一个 swap-block network 的 blcok 数量,一个 swap-block network 的重要性表示为: (sum_B Delta_{b})

    Substitute Blocks

    image-20201013191958267

    • Standard Block

      • 参数量: (2N^2k^2)
    • Grouped+Pointwise Block – G(g)

      • 参数量: (2((N^2k^2)/g+N^2))
    • Bottleneck Block – B(b)

      • 参数量:((N/b)^2k^2+2N^2/b)
    • Bottleneck Grouped+Pointwise Block – BG(b, g)

      • 参数量: ((N/bg)^2k^2+2N^2/b)

    Distillation

    (mathcal{L}_{A T}=mathcal{L}_{C E}+eta sum_{i=1}^{L}left|frac{mathbf{f}left(A_{i}^{t} ight)}{left|mathbf{f}left(A_{i}^{t} ight) ight|_{2}}-frac{mathbf{f}left(A_{i}^{s} ight)}{left|mathbf{f}left(A_{i}^{s} ight) ight|_{2}} ight|_{2} qquad (1))

    (mathbf{f}left(A_{i} ight)=left(1 / N_{A_{i}} ight) sum_{j=1}^{N_{A_{i}}} mathbf{a}_{i j}^{2}) ,其中 (i=1,2,...,L)(N_{A_i}) is the number of channels at layer i.


    image-20201013184149297

    • blocks pool ==> candidate blcok-swap networks space ==>
    • under constraint sample ==> compute fisher score ==> ranking network
    • Distillation (T: backbone network, S: block-swap network)

    Experiments

    CIFAR-10

    Setup

    • momentum:0.9
    • lr:init 0.1,cosine
    • minibatch size:128
    • weight decay:5e-4
    • β:1000

    Teacher Network:

    • 3 个 WRN-40-2(depth 40,width multiplier 2,18 blocks,2.2M params)

    Student Network:

    params constraint:200K, 400K, 600K, 800K

    • WRN-16-2 / WRN-40-1 / WRN-16-1
    • WRN-40-2 + mixed swap
    • WRN-40-2 + Single swap (MBConv6 / DARTS / DenseNet)
    • WRN-40-2 + SNIP pruning
    • WRN-40-2 + (l1) pruning

    image-20201013195136252


    image-20201013195156023


    ImageNet

    Setup

    • momentum:0.9
    • lr:init 0.1,step:30,60,90
    • minibatch size:256
    • weight decay:1e-4
    • β:750

    Teacher Network:

    • 1 个 ResNet34(16 blocks, 21.8M params)

    Student Network:

    params constraint:8M,3M

    • ResNet18 / ResNet18-0.5 (the channel width in the last 3 sections has been halved)
    • ResNet34 + mixed swap
    • ResNet34 + Single swap (G(4) / G(N))

    image-20201013195415645


    Ablation Study

    mixed block VS. single blcok

    image-20201013192337263

    mix swap 总是存在比 single swap 更好的结构


    One minibatch VS. N minibatch && Ranking correlation

    image-20201013192636430

    final err 与不同 batch 时下列指标的相关性:

    • acc
    • weight l2 norm
    • grad l1 norm
    • fisher score

    Sample Num?

    BlockSwap finds networks with final test errors of 4.85%. 4.54%, and 4.21% after 10, 100, and 1000 samples respectively.

    We empirically found that 1000 samples.


    Conclusion


    Summary


    To Read


    Reference

    https://blog.csdn.net/xbinworld/article/details/104591706

    https://www.zhihu.com/question/266846405

    https://openreview.net/forum?id=SklkDkSFPB

  • 相关阅读:
    GUID概念
    某猿的饭局
    SVN切分支步骤
    OSX:设置用户默认浏览器
    值得推荐的android开发框架简单介绍
    用实力让情怀落地!阅兵前线指挥车同款电视TCL H8800受捧
    Excel查询序列所相应的值-vLoopup函数,求比例分子改变但分母不变
    CSS3制作W3cplus的关注面板
    Spring MVC框架实例
    @property 和@synthesize
  • 原文地址:https://www.cnblogs.com/chenbong/p/13810902.html
Copyright © 2011-2022 走看看