【BlockSwap】2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget-论文阅读

zoukankan html css js c++ java

【BlockSwap】2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget-论文阅读
BlockSwap

2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

来源: ChenBong 博客园
- Institute：University of Edinburgh
- Author：Jack Turner，Michael O'Boyle
- GitHub：https://github.com/BayesWatch/pytorch-blockswap 【20+】
- Citation：1
Introduction
- backbone network (consist of standard block)
- cheap blocks pool ==> (swap backbone's block) candidate blcok-swap networks space
- under constraint sample ==> compute fisher score ==> ranking network
- Distillation (T: backbone network, S: block-swap network)
Contribution
- block-wise 的替换，相对于 NAS（bottom-up method）来说降低了搜索空间，速度更快，相对于改变网络的深度/宽度方法，如剪枝等（top-dowm method）来说搜索维度更高（剪枝只能修改每一层的filter个数，block swap 还可以修改层的类型）
- 基于 Fisher information 的候选网络快速评估算法
Method

Fisher Information
- 泰勒展开来计算filter重要性的方法与计算 Fisher information 的方法等价
- (Delta_{c}=frac{1}{2 N} sum_{n}^{N}left(sum_{i}^{W} sum_{j}^{H} a_{n i j} g_{n i j} ight)^{2})
  
  feature map大小 W×H， (sum a_{ij}*g_{ij}) 衡量一个（filter输出的）channel 的重要性 (Delta_{c})
- (Delta_{b}=sum_{c}^{C} Delta_{c})
  
  C是一个 blcok 的总通道数；一个block的重要性表示为 (Delta_{b})
- (sum_B Delta_{b})
  
  B 是一个 swap-block network 的 blcok 数量，一个 swap-block network 的重要性表示为： (sum_B Delta_{b})
Substitute Blocks
- Standard Block
  
  参数量： (2N^2k^2)
- Grouped+Pointwise Block – G(g)
  
  参数量： (2((N^2k^2)/g+N^2))
- Bottleneck Block – B(b)
  
  参数量：((N/b)^2k^2+2N^2/b)
- Bottleneck Grouped+Pointwise Block – BG(b, g)
  
  参数量： ((N/bg)^2k^2+2N^2/b)
Distillation

(mathcal{L}_{A T}=mathcal{L}_{C E}+eta sum_{i=1}^{L}left|frac{mathbf{f}left(A_{i}^{t} ight)}{left|mathbf{f}left(A_{i}^{t} ight) ight|_{2}}-frac{mathbf{f}left(A_{i}^{s} ight)}{left|mathbf{f}left(A_{i}^{s} ight) ight|_{2}} ight|_{2} qquad (1))

(mathbf{f}left(A_{i} ight)=left(1 / N_{A_{i}} ight) sum_{j=1}^{N_{A_{i}}} mathbf{a}_{i j}^{2}) ，其中 (i=1,2,...,L) ，(N_{A_i}) is the number of channels at layer i.
- blocks pool ==> candidate blcok-swap networks space ==>
- under constraint sample ==> compute fisher score ==> ranking network
- Distillation (T: backbone network, S: block-swap network)
Experiments

CIFAR-10

Setup
- momentum：0.9
- lr：init 0.1，cosine
- minibatch size：128
- weight decay：5e-4
- β：1000
Teacher Network：
- 3 个 WRN-40-2（depth 40，width multiplier 2，18 blocks，2.2M params）
Student Network：

params constraint：200K, 400K, 600K, 800K
- WRN-16-2 / WRN-40-1 / WRN-16-1
- WRN-40-2 + mixed swap
- WRN-40-2 + Single swap (MBConv6 / DARTS / DenseNet)
- WRN-40-2 + SNIP pruning
- WRN-40-2 + (l1) pruning
ImageNet

Setup
- momentum：0.9
- lr：init 0.1，step：30，60，90
- minibatch size：256
- weight decay：1e-4
- β：750
Teacher Network：
- 1 个 ResNet34（16 blocks, 21.8M params）
Student Network：

params constraint：8M，3M
- ResNet18 / ResNet18-0.5 (the channel width in the last 3 sections has been halved)
- ResNet34 + mixed swap
- ResNet34 + Single swap (G(4) / G(N))
Ablation Study

mixed block VS. single blcok

mix swap 总是存在比 single swap 更好的结构

One minibatch VS. N minibatch && Ranking correlation

final err 与不同 batch 时下列指标的相关性：
- acc
- weight l2 norm
- grad l1 norm
- fisher score
Sample Num?

BlockSwap finds networks with final test errors of 4.85%. 4.54%, and 4.21% after 10, 100, and 1000 samples respectively.

We empirically found that 1000 samples.

Conclusion

Summary

To Read

Reference

https://blog.csdn.net/xbinworld/article/details/104591706

https://www.zhihu.com/question/266846405

https://openreview.net/forum?id=SklkDkSFPB
查看全文

相关阅读:
跨平台GUIQt windows 开发环境安装配置（Eclipse CDT+ MinGW+QT） (转载)
跨平台GUIQt windows 开发环境安装配置（VS2005+QT+IntegrationPlugin）(转载)
跨平台GUIQt ACER Aspire on Linux 开发环境安装配置（QT + GCC ） (原创)
移动视频监控（2）原型开发Symbian客户端进展。
编程语言大串联(1)C#,Java,C++
优化页面上的sql
一个段错误调试
 查询数据库空间
 shell 批量替换多个文件中字符串
 用户组相关

原文地址：https://www.cnblogs.com/chenbong/p/13810902.html

【BlockSwap】2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget-论文阅读

BlockSwap

Introduction

Contribution

Method

Fisher Information

Substitute Blocks

Distillation

Experiments

CIFAR-10

Setup

ImageNet

Setup

Ablation Study

mixed block VS. single blcok

One minibatch VS. N minibatch && Ranking correlation

Sample Num?

Conclusion

Summary

To Read

Reference