zoukankan      html  css  js  c++  java
  • 【MCUNet】2020-NIPS-MCUNet Tiny Deep Learning on IoT Devices-论文阅读

    MCUNet

    2020-NIPS-MCUNet Tiny Deep Learning on IoT Devices

    来源:ChenBong 博客园

    • Institute:MIT、NTU、MIT-IBM Watson AI Lab
    • Author:Ji Lin、Song Han
    • GitHub:/
    • Citation:1

    Introduction

    • MCU(单片机)上的网络
      • 极低的内存(SRAM)和硬盘(Flash,read only)
      • 没有操作系统
    • 目前的轻量化网络主要为移动端(如智能手机)设计,而单片机的价格($5)比智能手机($500)低了几个数量级,应用范围也更加广泛,同时性能也低了N个数量级,因此如何在MCU上部署神经网络是一个巨大的挑战。

    image-20201006153110294

    • 我们提出了MCUNet,一种专为MCU设计的 model design(TinyNAS)与 inference library(TinyEngine)联合设计的方法,可在MCU上进行 ImageNet scale 的推理。
    • 首次在MCU上达到 ImageNet 的 70.2% top-1 acc

    DL in MCU

    现有的框架:

    • TF Lite Micro
    • CMSIS-NN
    • CMix-NN
    • MicroTVM

    缺点:

    • 运行时编译 network graph,消耗大量的 SRAM 和 Flash
    • layer-level optimization,没有利用整个网络的信息来进一步减少 memory usage(例如某些网络没有用到 5*5 conv,但 library 中依然保留这部分的功能以保证通用性)

    Efficient Neural Network Design

    • Model Compression
      • Pruning
      • Quantization
      • Tensor decomposition
    • Efficient Network Design
      • MobileNet,EfficientNet
      • NAS(dominate)

    Method

    TinyNAS: Two-Stage NAS for Tiny Memory Constraints

    • first optimizes the search space
    • then performs neural architecture search within the optimized space

    Optimize Search Space

    R = {48, 64, 80, ..., 192, 208, 224}

    W = {0.2, 0.3, 0.4, ..., 1.0}

    This leads to S = W×R = 12×9 = 108 possible search space

    Each search space configuration contains (3.3 × 10^{25}) possible sub-networks

    Our goal is to find the best search space configuration S* that contains the model with the highest accuracy while satisfying the resource constraints.

    如何找到S*?

    • Perform NAS on each of the search spaces and compare the final results
      • Search Speace ==> (under memory constrain) Searching ==> Compare Best Acc?
    • Evaluate the quality of the search space by randomly sampling m networks from the search space and comparing the distribution of satisfying networks
      • Search Speace ==> (under memory constrain) Sample ==> Training ==> Compare Acc?
        • (RegNet,一个 search space sample 500 model,训练10个epoch的acc 的 EDF,足以刻画 search space 的质量)
        • image-20201006165611402
      • 我们使用评估策略:Search Speace ==> (under memory constrain) Sample ==> Compare FLOPs (No training!)

    Assumption: A model with larger computation has a larger capacity, which is more likely to achieve higher accuracy.

    We only collect the CDF of FLOPs:

    image-20201006160245965


    TinyEngine: A Memory-Efficient Inference Library

    compilation vs. interpreter

    编译 vs. 解释


    memory scheduling

    layer-wise vs. model-wise


    kernel specialization

    the inner loop unrolling is also specialized for different kernel sizes (e.g., 9 repeated code segments for 3×3 kernel, and 25 for 5×5 ) to eliminate the branch instruction overheads

    Operation fusion is performed for Conv+Padding+ReLU+BN layers.


    Experiments

    Setup

    • Datasets
      • ImageNet
      • Visual Wake Words (VWW) 视觉唤醒词
      • Speech Commands (V2) 音频唤醒词
      • (did not use cifar)
    • Deployment
      • 320kB SRAM / 1MB Flash
      • 512kB SRAM / 2MB Flash

    Large-Scale Image Recognition

    Co-design

    image-20201006172439311

    Lower bit precision

    Under the same memory constraints, 4-bit MCUNet outperforms 8-bit by 2.2% by fitting a larger model in the memory

    image-20201006171022164


    Visual & Audio Wake Words

    https://www.youtube.com/watch?v=YvioBgtec4U&feature=youtu.be


    Analysis

    Search space optimization

    image-20201006160323562

    Sensitivity analysis on search space optimization

    image-20201006155951809

    • x轴:Flash(硬盘,存储模型)512kB~2048kB
    • y轴:SRAM(内存/显存,推理时存储 feature map)192kB~512kB

    1-2: SRAM 增大,input 分辨率增加,但由于Flash的限制,模型参数不能增加,因此 width 没有增加

    1-3: Flash 增大,模型参数可以增加,因此width增加,但 input 分辨率反而减少(由于模型宽度增大,卷积核变多,每层的 feature map 通道数也会增加,但由于 SRAM 不变,因此要减小 feature map 分辨率大小


    Conclusion


    Summary


    To Read

    Reference

    https://mp.weixin.qq.com/s/v7fjLWqV4fqJqoPewlKgbA

  • 相关阅读:
    动态生成 Excel 文件供浏览器下载的注意事项
    JavaEE 中无用技术之 JNDI
    CSDN 泄露用户密码给我们什么启示
    刚发布新的 web 单点登录系统,欢迎下载试用,欢迎提建议
    jQuery jqgrid 对含特殊字符 json 数据的 Java 处理方法
    一个 SQL 同时验证帐号是否存在、密码是否正确
    PostgreSQL 数据库在 Windows Server 2008 上安装注意事项
    快速点评 Spring Struts Hibernate
    Apache NIO 框架 Mina 使用中出现 too many open files 问题的解决办法
    解决 jQuery 版本升级过程中出现 toLowerCase 错误 更改 doctype
  • 原文地址:https://www.cnblogs.com/chenbong/p/13773984.html
Copyright © 2011-2022 走看看