zoukankan      html  css  js  c++  java
  • 【MCUNet】2020-NIPS-MCUNet Tiny Deep Learning on IoT Devices-论文阅读

    MCUNet

    2020-NIPS-MCUNet Tiny Deep Learning on IoT Devices

    来源:ChenBong 博客园

    • Institute:MIT、NTU、MIT-IBM Watson AI Lab
    • Author:Ji Lin、Song Han
    • GitHub:/
    • Citation:1

    Introduction

    • MCU(单片机)上的网络
      • 极低的内存(SRAM)和硬盘(Flash,read only)
      • 没有操作系统
    • 目前的轻量化网络主要为移动端(如智能手机)设计,而单片机的价格($5)比智能手机($500)低了几个数量级,应用范围也更加广泛,同时性能也低了N个数量级,因此如何在MCU上部署神经网络是一个巨大的挑战。

    image-20201006153110294

    • 我们提出了MCUNet,一种专为MCU设计的 model design(TinyNAS)与 inference library(TinyEngine)联合设计的方法,可在MCU上进行 ImageNet scale 的推理。
    • 首次在MCU上达到 ImageNet 的 70.2% top-1 acc

    DL in MCU

    现有的框架:

    • TF Lite Micro
    • CMSIS-NN
    • CMix-NN
    • MicroTVM

    缺点:

    • 运行时编译 network graph,消耗大量的 SRAM 和 Flash
    • layer-level optimization,没有利用整个网络的信息来进一步减少 memory usage(例如某些网络没有用到 5*5 conv,但 library 中依然保留这部分的功能以保证通用性)

    Efficient Neural Network Design

    • Model Compression
      • Pruning
      • Quantization
      • Tensor decomposition
    • Efficient Network Design
      • MobileNet,EfficientNet
      • NAS(dominate)

    Method

    TinyNAS: Two-Stage NAS for Tiny Memory Constraints

    • first optimizes the search space
    • then performs neural architecture search within the optimized space

    Optimize Search Space

    R = {48, 64, 80, ..., 192, 208, 224}

    W = {0.2, 0.3, 0.4, ..., 1.0}

    This leads to S = W×R = 12×9 = 108 possible search space

    Each search space configuration contains (3.3 × 10^{25}) possible sub-networks

    Our goal is to find the best search space configuration S* that contains the model with the highest accuracy while satisfying the resource constraints.

    如何找到S*?

    • Perform NAS on each of the search spaces and compare the final results
      • Search Speace ==> (under memory constrain) Searching ==> Compare Best Acc?
    • Evaluate the quality of the search space by randomly sampling m networks from the search space and comparing the distribution of satisfying networks
      • Search Speace ==> (under memory constrain) Sample ==> Training ==> Compare Acc?
        • (RegNet,一个 search space sample 500 model,训练10个epoch的acc 的 EDF,足以刻画 search space 的质量)
        • image-20201006165611402
      • 我们使用评估策略:Search Speace ==> (under memory constrain) Sample ==> Compare FLOPs (No training!)

    Assumption: A model with larger computation has a larger capacity, which is more likely to achieve higher accuracy.

    We only collect the CDF of FLOPs:

    image-20201006160245965


    TinyEngine: A Memory-Efficient Inference Library

    compilation vs. interpreter

    编译 vs. 解释


    memory scheduling

    layer-wise vs. model-wise


    kernel specialization

    the inner loop unrolling is also specialized for different kernel sizes (e.g., 9 repeated code segments for 3×3 kernel, and 25 for 5×5 ) to eliminate the branch instruction overheads

    Operation fusion is performed for Conv+Padding+ReLU+BN layers.


    Experiments

    Setup

    • Datasets
      • ImageNet
      • Visual Wake Words (VWW) 视觉唤醒词
      • Speech Commands (V2) 音频唤醒词
      • (did not use cifar)
    • Deployment
      • 320kB SRAM / 1MB Flash
      • 512kB SRAM / 2MB Flash

    Large-Scale Image Recognition

    Co-design

    image-20201006172439311

    Lower bit precision

    Under the same memory constraints, 4-bit MCUNet outperforms 8-bit by 2.2% by fitting a larger model in the memory

    image-20201006171022164


    Visual & Audio Wake Words

    https://www.youtube.com/watch?v=YvioBgtec4U&feature=youtu.be


    Analysis

    Search space optimization

    image-20201006160323562

    Sensitivity analysis on search space optimization

    image-20201006155951809

    • x轴:Flash(硬盘,存储模型)512kB~2048kB
    • y轴:SRAM(内存/显存,推理时存储 feature map)192kB~512kB

    1-2: SRAM 增大,input 分辨率增加,但由于Flash的限制,模型参数不能增加,因此 width 没有增加

    1-3: Flash 增大,模型参数可以增加,因此width增加,但 input 分辨率反而减少(由于模型宽度增大,卷积核变多,每层的 feature map 通道数也会增加,但由于 SRAM 不变,因此要减小 feature map 分辨率大小


    Conclusion


    Summary


    To Read

    Reference

    https://mp.weixin.qq.com/s/v7fjLWqV4fqJqoPewlKgbA

  • 相关阅读:
    MySQL Unable to convert MySQL datetime value to System.DateTime 解决方案
    Zend 无限试用
    SQL 触发器
    C# 多线程示例
    JS 实现打印
    apache开启.htaccess
    MySQL 安装包下载教程
    js系列(10)js的运用(二)
    js系列(9)js的运用(一)
    js系列(8)简介
  • 原文地址:https://www.cnblogs.com/chenbong/p/13773984.html
Copyright © 2011-2022 走看看