  • Mobilenet V1

    参考博客: https://cuijiahua.com/blog/2018/02/dl_6.html

    1. Depth Separable Convolution

    A standard convolution both filters and combines inputs into a new set of outputs in one step. The depthwise separable convolution splits this into two layers, a separate layer for filtering and a separate layer for combining.


    \(D_K ∗D_K ∗M∗D_F ∗D_F\)

    • M为输入的通道数

    • \(D_K\)为卷积核的宽和高

    • \(D_F\)为输入feature map的宽和高


    \(D_K ∗D_K ∗M∗N*D_F ∗D_F\)

    如果采用 Depthwise Convolutional Filters,标准交卷:








    相比较,Depthwise Separable Convolution 的计算量:

    举一个具体的例子,给定输入图像的为 3 通道的 224x224 的图像,VGG16网络的第3个卷积层conv2_1输入的是尺寸为 112 的特征图,通道数为 64 ,卷积核尺寸为 3,卷积核个数为 128,传统卷积运算量就是:



    2. 网络结构


    • deepwise的卷积和后面的1x1卷积被当成了两个独立的模块,都在输出结果的部分加入了Batch Normalization和非线性激活单元。

    Deepwise结合 1x1 的卷积方式代替传统卷积不仅在理论上会更高效,而且由于大量使用 1x1 的卷积,可以直接使用高度优化的数学库来完成这个操作。以Caffe为例,如果要使用这些数学库,要首先使用 im2col 的方式来对数据进行重新排布,从而确保满足此类数学库的输入形式;但是 1x1 方式的卷积不需要这种预处理。


    3. 宽度因子和分辨率因子





    4. 代码实现


    """MobileNet v1 models for Keras.
    MobileNet is a general architecture and can be used for multiple use cases.
    Depending on the use case, it can use different input layer size and
    different width factors. This allows different width models to reduce
    the number of multiply-adds and thereby
    reduce inference cost on mobile devices.
    MobileNets support any input size greater than 32 x 32, with larger image sizes
    offering better performance.
    The number of parameters and number of multiply-adds
    can be modified by using the `alpha` parameter,
    which increases/decreases the number of filters in each layer.
    By altering the image size and `alpha` parameter,
    all 16 models from the paper can be built, with ImageNet weights provided.
    The paper demonstrates the performance of MobileNets using `alpha` values of
    1.0 (also called 100 % MobileNet), 0.75, 0.5 and 0.25.
    For each of these `alpha` values, weights for 4 different input image sizes
    are provided (224, 192, 160, 128).
    The following table describes the size and accuracy of the 100% MobileNet
    on size 224 x 224:
    Width Multiplier (alpha) | ImageNet Acc |  Multiply-Adds (M) |  Params (M)
    |   1.0 MobileNet-224    |    70.6 %     |        529        |     4.2     |
    |   0.75 MobileNet-224   |    68.4 %     |        325        |     2.6     |
    |   0.50 MobileNet-224   |    63.7 %     |        149        |     1.3     |
    |   0.25 MobileNet-224   |    50.6 %     |        41         |     0.5     |
    The following table describes the performance of
    the 100 % MobileNet on various input sizes:
          Resolution      | ImageNet Acc | Multiply-Adds (M) | Params (M)
    |  1.0 MobileNet-224  |    70.6 %    |        529        |     4.2     |
    |  1.0 MobileNet-192  |    69.1 %    |        529        |     4.2     |
    |  1.0 MobileNet-160  |    67.2 %    |        529        |     4.2     |
    |  1.0 MobileNet-128  |    64.4 %    |        529        |     4.2     |
    The weights for all 16 models are obtained and translated
    from Tensorflow checkpoints found at
    # Reference
    - [MobileNets: Efficient Convolutional Neural Networks for
       Mobile Vision Applications](https://arxiv.org/pdf/1704.04861.pdf))
    from keras.models import Model
    from keras.layers import Input, Activation, Dropout, Reshape, BatchNormalization, GlobalAveragePooling2D, GlobalMaxPooling2D
    from keras.layers import Conv2D, DepthwiseConv2D
    from keras.utils import  plot_model
    from keras import backend as K
    def relu6(x):
        return K.relu(x, max_value=6)
    def _make_divisiable(v, divisor=8, min_value=8):
        if min_value is None:
            min_value = divisor
        new_v = max(min_value, int(v + divisor/2) // divisor * divisor)
        # Make sure that round down does not go down by more than 10%.
        if new_v < 0.9 * v:
            new_v += divisor
        return new_v
    def _conv_bolck(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1), bn_epsilon=1e-3,
                    bn_momentum=0.99, block_id=1):
        """ Adds an initial convolution layer (with batch normalization and relu6).
            inputs: Input tensor of shape `(rows, cols, 3)` (with `channels_last` data format)
                    or (3, rows, cols) (with `channels_first` data format).
                    It should have exactly 3 inputs channels, and width and height should be no smaller than 32.
                    E.g. `(224, 224, 3)` would be one valid value.
            filters: Integer, the dimensionality of the output space.
                    (i.e. the number output of filters in the convolution).
            alpha: controls the width of the network.
                    - If `alpha` < 1.0, proportionally decreases the number of filters in each layer.
                    - If `alpha` > 1.0, proportionally increases the number of filters in each layer.
                    - If `alpha` = 1, default number of filters from the paper are used at each layer.
            kernel: An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window.
                    Can be a single integer to specify the same value for all spatial dimensions.
            strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the width and height.
                     Can be a single integer to specify the same value for all spatial dimensions.
                     Specifying any stride value != 1 is incompatible with specifying any `dilation_rate` value != 1.
            bn_epsilon: Epsilon value for BatchNormalization
            bn_momentum: Momentum value for BatchNormalization
            block_id: Integer, a unique identification designating the block number.
            Output tensor of block
        Input shape:
            4D tensor with shape: `(samples, channels, rows, cols)` if data_format='channels_first'
                               or `(samples, rows, cols, channels)` if data_format='channels_last'.
        Output shape:
            4D tensor with shape: `(samples, filters, new_rows, new_cols)` if data_format='channels_first'
                               or  `(samples, new_rows, new_cols, filters)` if data_format='channels_last'.
                              `rows` and `cols` values might have changed due to stride.
        channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
        filters = _make_divisiable(filters * alpha)  # 乘以宽度因子后的卷积核数量,可能不能被divisor=8整除
        x = Conv2D(filters, kernel, use_bias=False, strides=strides, name='conv{}'.format(block_id))(inputs)
        x = BatchNormalization(axis=channel_axis, momentum=bn_momentum, epsilon=bn_epsilon, name='conv{}_bn'.format(block_id))(x)
        return Activation(relu6, name='conv{}_relu'.format(block_id))(x)
    def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha, depth_multiplier=1,
                              strides=(1, 1), bn_epsilon=1e-3, block_id=1):
        """Adds a depthwise convolution block.
        A depthwise convolution block consists of
        a depthwise conv, batch normalization, relu6,
        pointwise convolution, batch normalization and relu6
            inputs: Input tensor of shape `(rows, cols, channels)`(with `channels_last` data format)
                    or (channels, rows, cols)(with `channels_first` data format)
            pointwise_conv_filters: Integer, the dimensionality of the output space
                                    (i.e. the number output of filters in the pointwise convolution).
            alpha: controls the width of the network.
                - If `alpha` < 1.0, proportionally decreases the number of filters in each layer.
                - If `alpha` > 1.0, proportionally increases the number of filters in each layer.
                - If `alpha` = 1, default number of filters from the paper are used at each layer.
            depth_multiplier: The number of depthwise convolution output channels for each channel.
                            The total number of depthwise convolution output channels
                            will be equal to `filters_in * depth_multiplier`. 每个通道的深度卷积输出通道的数量
            strides:  An integer or tuple/list of 2 integers,
                    specifying the strides of the convolution along the width and height.
                    Can be a single integer to specify the same value for all spatial dimensions.
                    Specifying any stride value != 1 is incompatible with specifying any `dilation_rate` value != 1.
            bn_epsilon: Epsilon value for BatchNormalization
            block_id: Integer, a unique identification designating the block number.
            Output tensor of block
        Input shape:
             4D tensor with shape: `(batch, channels, rows, cols)` if data_format='channels_first'
                                    or `(batch, rows, cols, channels)` if data_format='channels_last'.
        Output shape:
            4D tensor with shape: `(batch, filters, new_rows, new_cols)` if data_format='channels_first'
                                    or `(batch, new_rows, new_cols, filters)` if data_format='channels_last'.
             `rows` and `cols` values might have changed due to stride.
        channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
        pointwise_conv_filters = _make_divisiable(pointwise_conv_filters * alpha)
        # Depthwise Conv2D
        # 只有depth_multiplier个卷积核,其将卷积操作分解,实际上卷积核shape: 3 x 3 x input_channels x depth_multiplier
        # 以下面为例,DepthwiseConv2D输出的tensor的shape: (batch, rows, cols, input_channels * depth_multiplier)
        x = DepthwiseConv2D(kernel_size=(3, 3),
        x = BatchNormalization(axis=channel_axis, epsilon=bn_epsilon, name='conv_dw_{}_bn'.format(block_id))(x)
        x = Activation(relu6, name='conv_dw_{}_relu'.format(block_id))(x)
        # Pointwise Conv2D  pointwise_conv_filters控制最终out_channels
        x = Conv2D(pointwise_conv_filters,
                   kernel_size=(1, 1),
                   strides=(1, 1),
        x = BatchNormalization(axis=channel_axis, epsilon=bn_epsilon, name='conv_pw_{}_bn'.format(block_id))(x)
        return Activation(relu6, name='conv_pw_{}_relu'.format(block_id))(x)
    def mobilenetv1(input_shape,
        """Instantiates the MobileNet architecture.
            input_shape: optional shape tuple, only to be specified if `include_top` is False.
                        (otherwise the input shape has to be `(224, 224, 3)` (with `channels_last` data format)
                        or (3, 224, 224) (with `channels_first` data format).
                        It should have exactly 3 inputs channels, and width and height should be no smaller than 32.
                        E.g. `(200, 200, 3)` would be one valid value.
            alpha: controls the width of the network.
                    - If `alpha` < 1.0, proportionally decreases the number of filters in each layer.
                    - If `alpha` > 1.0, proportionally increases the number of filters in each layer.
                    - If `alpha` = 1, default number of filters from the paper are used at each layer.
            depth_multiplier: depth multiplier for depthwise convolution
            dropout: dropout rate
            classes: optional number of classes to classify images into
            A Keras model instance.
            ValueError: in case of invalid argument for `weights`, or invalid input shape.
            RuntimeError: If attempting to run this model with a backend that does not support separable convolutions.
        x_input = Input(shape=input_shape)
        x = _conv_bolck(x_input, 32, alpha, strides=(2, 2))
        x = _depthwise_conv_block(x, 64, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
                                  strides=(2, 2), block_id=2)
        x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
                                  strides=(2, 2), block_id=4)
        x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                                  strides=(2, 2),block_id=6)
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
        x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                                  strides=(2, 2),block_id=12)
        x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
        shape = (1, 1, int(1024 * alpha))
        x  = GlobalAveragePooling2D()(x)
        x = Reshape(shape, name='reshape_1')(x)
        x = Dropout(dropout, name='dropout')(x)
        x = Conv2D(classes, (1, 1), padding='same', name='conv_preds')(x)
        x = Activation('softmax', name='act_sotmax')(x)
        x = Reshape((classes,), name='reshape_2')(x)
        return Model(x_input, x)
    if __name__ == '__main__':
        alpha = 1
        depth_multiplier = 1
        mobilenet = mobilenetv1(input_shape=(224, 224, 3), alpha=alpha, depth_multiplier=depth_multiplier)
        plot_model(mobilenet, show_shapes=True, to_file='mobilenet_alpha{}_depth_multiplier_{}.png'.format(alpha, depth_multiplier))
