【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读

zoukankan html css js c++ java

【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读
NeuralScale

2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

来源: ChenBong 博客园
- Institute：National Chiao Tung University
- Author：Eugene Lee、Chen-Yi Lee (H40)
- GitHub：https://github.com/eugenelet/NeuralScale
- Citation：3
Introduction

提出了一种按照各层的敏感性, 进行layer-wise的缩放最终达到目标参数量的方法, 区别于uniform的缩放。

Motivation

Contribution

Method

进行 P个 epoch的模型预训练, 在预训练模型的基础上开始迭代剪枝

每次迭代剪枝后, 每一层可以获得一个数据点: (xi_{l}=left{ au, phi_{l} ight}) , 其中 ( au) 是模型总参数量, (phi_{l}) 是第 (l) 层的 filter个数

N次迭代后, 每一层可以获得N个数据点: (oldsymbol{xi}_{l}=left{left{ au^{(n)}, phi_{l}^{(n)} ight}_{n=1}^{N} ight})

迭代filter剪枝直到 filter总数 < 原始 filter总数的 (epsilon=0.05) 时, 结束剪枝

将每一层的数据点 (oldsymbol{xi}_{l}) 画出来, 就得到每一层 filter个数关于总参数量的敏感性曲线:

对曲线进行函数拟合:

(phi_{l}left( au mid alpha_{l}, eta_{l} ight)=alpha_{l} au^{eta_{l}}) ,

(ln phi_{l}left( au mid alpha_{l}, eta_{l} ight)=ln alpha_{l}+eta_{l} ln au)

所有层的 layer-wise filter数量记为: (Phi( au mid Theta)={phi_1, phi_2, ..., phi_l,}) , (Theta={alpha_1, eta_1, alpha_2, eta_2, ..., alpha_l, eta_l})

得到各层的拟合函数 (Phi( au mid Theta)={phi_1, phi_2, ..., phi_l,}) 以后, 为了得到目标参数量 (hat au) 下的 layer-wise filter数量, 只需要将 (hat au) 代入 (Phi(hat au mid Theta)) , 即可获得layer-wise filter数量

但此时的模型的实际总参数量 (h(f(oldsymbol{x} mid oldsymbol{W}, oldsymbol{Phi}(hat{ au} mid oldsymbol{Theta})))) 与目标 (hat au) 存在差距, 作者提出了, 从初始化 ( au=hat au) 开始, 对 ( au) 进行梯度下降, 找到一个合适的 ( au) , 使得模型实际总参数量 (h(f)) 精确等于 (hat au) , 作者将这个过程称为 Architecture Descent

Experiments

Setup
- GPU: single 1080ti
- CIFAR10 / CIFAR100
  
  pre-trian: 10epoch
  
  迭代剪枝
  
  fine-tune?
  
  300 epochs
  
  lr=0.1, decay by 10 at 100, 200, 250 epoch
  
  weight decay=(5^{-4}) , ≈0.0016
- TinyImageNet
  
  pre-trian: 10epoch
  
  迭代剪枝
  
  fine-tune?
  
  150 epochs
  
  lr=0.1, decay by 10 at 50, 100 epoch
  
  weight decay=(5^{-4}) , ≈0.0016
Importance of Architecture Descent

横轴表示 ( au) 的SGD迭代次数, 纵轴表示层数, 颜色表示该层的卷积核个数:

Benchmarking of NeuralScale

param vs acc

latency vs acc

main result

Conclusion

Summary

Reference
查看全文

相关阅读:
定位中方向余弦矩阵(DCM)简介
 前端UI框架小汇总
 前端知识点小结
 overflow兼容iOS
使用Flexible实现手淘H5页面的终端适配
 获取当前Javascript脚本文件的路径
 Javascript中document.execCommand()的用法 ( 实现浏览器菜单的很多功能 )
jquery中的map()方法与js中的map()方法
 js判断是否为移动端
 页面制作注意事项

原文地址：https://www.cnblogs.com/chenbong/p/14801135.html

【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读

NeuralScale

Introduction

Motivation

Contribution

Method

Experiments

Setup

Importance of Architecture Descent

Benchmarking of NeuralScale

param vs acc

latency vs acc

main result

Conclusion

Summary

Reference