【MnasNet】2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile-论文阅读

zoukankan html css js c++ java

【MnasNet】2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile-论文阅读
MnasNet

2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile

来源：ChenBong 博客园
- Institute：Google Brain、Google
- Author：Mingxing Tan、Quoc V. Le
- GitHub：https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet
- Citation：740+
Introduction

使用实际的 latency 和性能acc 的 trade-off 作为搜索的目标

不是使用搜索cell，堆叠cell的单一的搜索空间，使用了新的搜索空间，允许每一个block的op类型各不相同，增加了layer的多样性

Motivation
- mobile上的网络设计，有多个优化目标，如参数量少，速度快，准确率高等
- 之前的 nas 方法在考虑 latency 时，常常使用如 FLOPs 作为实际 latency 的 proxy，实际上 real world latency 和 FLOPs 之间是有很大的差距的
- 之前的 nas 方法很多都是采用搜索cell-堆叠cell 的策略，虽然这样可以减小搜索空间，但是却丢失了 layer diversity，导致搜索不到一些更好的模型
Contribution
- 同时考虑多个优化目标的 soft reward：latency and acc trade-off 的 Multi-objective soft reward
- 将 real world latency 作为优化目标
- 提出 layer diversity search space，可以实现 layer diversity
Method

Pipeline

5 epoch × 8k = 40k epoch

sample 太多，采用什么搜索算法估计都没有区别，估计随机搜索性能也不差

Multi-objective Soft Reward

Hard Constraint

only maximizes a single metric and does not provide multiple Pareto optimal solutions

Soft Constraint

weight sum method：

(maximize~ACC(m)+λ|LAT(m)-T|)

where (λ=left{egin{array}{ll}alpha, & ext { if } L A T(m) leq T \ eta, & ext { otherwise }end{array} ight.)

We pick the weighted product method because it is easy to customize, but we expect methods like weighted sum should be also fine.

weight product method：

An empirical rule for picking α and β is to ensure Pareto-optimal solutions have similar reward under different accuracy-latency trade-offs.

For instance, we empirically observed doubling the latency usually brings about 5% relative accuracy gain.

Given two models:

(1) M1 has latency (l) and accuracy (a);

(2) M2 has latency (2l) and 5% higher accuracy (a·(1 + 5\% )),

they should have similar reward:

$ Reward(M2) = a · (1 + 5%)·(2l/T)^β ≈ Reward(M1) = a · (l/T)^β$.

Solving this gives β ≈ −0.07. Therefore, we use α = β = −0.07 in our experiments unless explicitly stated.

Hard Constraint vs. Soft Constraint

Layer Diversity Search Space

For #layers in each block, we search for {0, +1, -1} based on MobileNetV2;

for filter size per layer, we search for its relative size in {0.75, 1.0, 1.25} to MobileNetV2.

搜索空间还是基于手工设计的网络 MobileNet V2，实际上还是在搜一个类似MB V2的结构。

Search Algorithm

reinforcement learning approach

use sample-eval-update loop to train the controller.

Experiments

Setup
- Optimizer：RMSProp，decay=0.9，momentum=0.9
- momentum：0.99
- weight decay：1e-5
- batch size：4K
- lr：
  
  warm up：0 to 0.256
  
  decayed by 0.97 every 2.4 epochs
Model Scaling Performance

图5说明，无论是调整 Multiplier 还是 Input size，搜出来的结构都有很好的 Acc-Latency trade off。

问题：是巧合还是说明当一个结构有很好的 Acc-Latency trade off 时，在不同的scale下的表现都会一致地好？即 scale 不影响不同结构 scale 之后的 rank？

Ablation Study

Soft vs. Hard Latency Constraint

图6说明，hard constraint 下，搜到的结构主要分布在 T=75ms 以下，而 soft constraint 有更大的概率去搜索离约束 T=75ms 更远的模型。从而可以更好地得到 acc-latency 的 Pareto optimal 曲线。

Multi-objective Soft Reward and Layer Diversity Search Space

单一因素的影响

本文的2个主要变化： A：Multi-objective soft reward 和 B：Layer diversity search space

要说明 A 和 B 都有效，应要做4个实验：

A+B、A、B、baseline

得到以下形式的结果：
- A+B > A > baseline
  
  A+B > A
  
  A > baseline
  
  A+B > baseline
- A+B > B > baseline
  
  A+B > B
  
  B > baseline
A > baseline 和 A+B > B，说明A因素有效；B > baseline 和 A+B > A，说明B因素有效。

这里只做了A+B、A、baseline，缺了B
- A+B > A ?> baseline
  
  A+B > A，说明都在 A(Multi obj) 的条件下，加上更大的搜索空间 B (Layer diversity) 会更好
  
  A ?> baseline，不能说明仅使用 A(Multi-obj) 会更好
  
  A+B > baseline，同时使用 A(Multi obj) 和 B (Layer diversity)时，比 baseline 更好
- A+B > B(无) > baseline
  
  A+B > B，没做，不能说明在都使用 B(Layer diversity) 的条件下，有效性
  
  B > baseline，没做，不能说明仅使用 B(Layer diversity) 会更好
在搜到的结构上替换op

把搜到的 MnasNet-A1 不同类型的op都替换成相同类型的op，得到不同的变体，想说明 layer diversity 对acc-latency trade off 很重要。

问题：但变体不是专门在新的空间上重新搜索相同的数量，可能在单一op的空间中，也存在 acc-latency trade off 很好的模型，这个实验同样不能说明使用 layer diversity 的 search space 会更好。

Conclusion

Summary
- 搜索算法不是本文的主要贡献
- Multi-objective soft constraint
  
  对 Multi-objective 做 weight product 的形式比较少见，可以更好地探索 constraint 附近的空间，在需要找 pareto optimal 曲线的时候可以尝试
- Layer diversity Search Space：
  
  问题：无论是单一因素的实验还是替换op的实验，都没有充分证明Layer diversity Search Space的有效性
- 在ImageNet 上 40k 个epoch 的开销，计算成本高
To Read

Reference
查看全文

相关阅读:
Android开发之无线遥控器
 那些有意思的代码
 更改Android编译软件版本(make/gcc/bision)
ubuntu16.04安装virtualbox5.1失败 gcc:error:unrecognized command line option ‘-fstack-protector-strong’
Android编译环境折腾记
 BM25相关度打分公式
 javascript作用域
 javascript 中关于call方法的详解。
JavaScript RegExp 对象
 vue runtime 问题

原文地址：https://www.cnblogs.com/chenbong/p/14133167.html

【MnasNet】2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile-论文阅读

MnasNet

Introduction

Motivation

Contribution

Method

Pipeline

Multi-objective Soft Reward

Hard Constraint

Soft Constraint

weight sum method：

weight product method：

Hard Constraint vs. Soft Constraint

Layer Diversity Search Space

Search Algorithm

Experiments

Setup

Model Scaling Performance

Ablation Study

Soft vs. Hard Latency Constraint

Multi-objective Soft Reward and Layer Diversity Search Space

单一因素的影响

在搜到的结构上替换op

Conclusion

Summary

To Read

Reference