zoukankan      html  css  js  c++  java
  • 【NAS工具箱】Drop Path介绍 + Dropout回顾

    【前言】Drop Path是NAS中常用到的一种正则化方法,由于网络训练的过程中常常是动态的,Drop Path就成了一个不错的正则化工具,在FractalNet、NASNet等都有广泛使用。

    Dropout

    Dropout是最早的用于解决过拟合的方法,是所有drop类方法的大前辈。Dropout在12年被Hinton提出,并且在ImageNet Classification with Deep Convolutional Neural Network工作AlexNet中使用到了Dropout。

    原理 :在前向传播的时候,让某个神经元激活以概率1-keep_prob(0<p<1)停止工作。

    功能 : 这样可以让模型泛化能力更强,因为其不会过于以来某些局部的节点。训练阶段以keep_prob的概率保留,以1-keep_prob的概率关闭;测试阶段所有的神经元都不关闭,但是对训练阶段应用了dropout的神经元,输出值需要乘以keep_prob。

    具体是这样的:

    假设一个神经元的输出激活值为a,在不使用dropout的情况下,其输出期望值为a,如果使用了dropout,神经元就可能有保留和关闭两种状态,把它看作一个离散型随机变量,它就符合概率论中的0-1分布,其输出激活值的期望变为 p*a+(1-p)*0=pa,此时若要保持期望和不使用dropout时一致,就要除以 p
    作者:种子_fe
    链接:https://www.imooc.com/article/30129

    实现 : pytorch中的实现如下。

    class _DropoutNd(Module):
        __constants__ = ['p', 'inplace']
        p: float
        inplace: bool
    
        def __init__(self, p: float = 0.5, inplace: bool = False) -> None:
            super(_DropoutNd, self).__init__()
            if p < 0 or p > 1:
                raise ValueError("dropout probability has to be between 0 and 1, "
                                 "but got {}".format(p))
            self.p = p
            self.inplace = inplace
    
        def extra_repr(self) -> str:
            return 'p={}, inplace={}'.format(self.p, self.inplace)
        
    class Dropout(_DropoutNd):
        def forward(self, input: Tensor) -> Tensor:
            return F.dropout(input, self.p, self.training, self.inplace)
    

    funtional.py中的dropout实现:

    def dropout(input: Tensor, p: float = 0.5, training: bool = True, inplace: bool = False) -> Tensor:
        r"""
        During training, randomly zeroes some of the elements of the input
        tensor with probability :attr:`p` using samples from a Bernoulli
        distribution.
        See :class:`~torch.nn.Dropout` for details.
        Args:
            p: probability of an element to be zeroed. Default: 0.5
            training: apply dropout if is ``True``. Default: ``True``
            inplace: If set to ``True``, will do this operation in-place. Default: ``False``
        """
        if has_torch_function_unary(input):
            return handle_torch_function(dropout, (input,), input, p=p, training=training, inplace=inplace)
        if p < 0.0 or p > 1.0:
            raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
        return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
    

    最终在Dropout.cpp中找到具体实现:

    template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
    Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
      TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
      if (p == 0 || !train || input.numel() == 0) {
        return input;
      }
    
      if (p == 1) {
        return multiply<inplace>(input, at::zeros({}, input.options()));
      }
    
      at::Tensor b; // used for alpha_dropout only
      auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
      noise.bernoulli_(1 - p);
      if (alpha_dropout) {
        constexpr double alpha = 1.7580993408473766;
        double a = 1. / std::sqrt((alpha * alpha * p + 1) * (1 - p));
        b = noise.add(-1).mul_(alpha * a).add_(alpha * a * p);
        noise.mul_(a);
      } else {
        noise.div_(1 - p);
      }  
    
      if (!alpha_dropout) {
        return multiply<inplace>(input, noise);
      } else {
        return multiply<inplace>(input, noise).add_(b);
      }
    }
    

    流程:

    • 判断p的范围 以及训练状态
    • 使用1-p的概率得到伯努利分布(0-1分布)
    • (input / 1-p) * 伯努利分布

    Drop Path

    原理 :字如其名,Drop Path就是随机将深度学习网络中的多分支结构随机删除。

    功能 :一般可以作为正则化手段加入网络,但是会增加网络训练的难度。尤其是在NAS问题中,如果设置的drop prob过高,模型甚至有可能不收敛。

    实现

    def drop_path(x, drop_prob: float = 0., training: bool = False):
        if drop_prob == 0. or not training:
            return x
        keep_prob = 1 - drop_prob
        shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
        random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
        random_tensor.floor_()  # binarize
        output = x.div(keep_prob) * random_tensor
        return output
    
    
    class DropPath(nn.Module):
        """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
        """
        def __init__(self, drop_prob=None):
            super(DropPath, self).__init__()
            self.drop_prob = drop_prob
    
        def forward(self, x):
            return drop_path(x, self.drop_prob, self.training)
    

    有了Dropout的理论铺垫,这里的实现就比较明了了,具体使用的时候一般是这样的:

    x = x + self.drop_path(self.conv(x))
    

    Drop Path不能直接这样使用:

    x = self.drop_path(x)
    

    Reference

    https://www.cnblogs.com/dan-baishucaizi/p/14703263.html

    https://www.imooc.com/article/30129

    https://www.github.com/pytorch/pytorch

    代码改变世界
  • 相关阅读:
    HTML基础
    JVM内存和JVM调优(五)--分代垃圾回收详述
    JVM内存和JVM调优(四)--如何区分垃圾
    JVM内存和JVM调优(三)--基本GC垃圾回收算法
    JVM内存和JVM调优(二)--引用类型
    JVM内存和JVM调优(一)--堆栈概念
    isAssignableFrom和instanceof
    spring学习(三十九)--自定义注解
    SPRING学习(三十八)--SPRING集成MYBATIS之数据库连接池和多种数据源配置方式(三)
    SPRING学习(三十七)--SPRING集成MYBATIS(二)
  • 原文地址:https://www.cnblogs.com/pprp/p/14815168.html
Copyright © 2011-2022 走看看