zoukankan      html  css  js  c++  java
  • 【MATLAB深度学习】深度学习简介

    深度学习简介

      深度学习是一种运用深度神经网络的机器学习技术,深度学习的创新在于许多微小技术的改进。具备更深层次的神经网络导致性能降低的原因在于其网络未能被有效地训练。在深度神经网络的训练过程中,反向传播算法面临这三个主要问题:梯度消失、过拟合、计算负载。

    (1)梯度消失

      在采用反向传播算法进行训练时,梯度消失发生在输出误差可能无法到达更远的节点。解决该问题的典型方法是使用修正线性单元(Rectified Linear Unit,ReLU)作为激活函数。

    (2)过拟合

      深度神经网络尤其容易过拟合的原因在于它包含了更多的隐含层以及权重值,致使其模型变得更复杂。最具代表性的解决方法是Dropout,即针对一些随机选定的节点而不是整个网络进行训练。Dropout的合适比例约为50%以及25%。另一种是在代价函数中增加正则化项。

    (3)计算负载

      使用GPU以及批量归一化等方法解决。

    1.ReLU实例

      输入数据为5个5*5矩阵,分别为1,2,3,4,5。网络结构为25个输入节点,3个隐含层,每个隐含层20个节点,5个输出节点。代码如下:

    function [W1, W2, W3, W4] = DeepReLU(W1, W2, W3, W4, X, D)
      alpha = 0.01;
      
      N = 5;  
      for k = 1:N
        x  = reshape(X(:, :, k), 25, 1); 
        v1 = W1*x;
        y1 = ReLU(v1);
        
        v2 = W2*y1;
        y2 = ReLU(v2);
        
        v3 = W3*y2;
        y3 = ReLU(v3);
        
        v  = W4*y3;
        y  = Softmax(v);
    
        d     = D(k, :)';
        
        e     = d - y;
        delta = e;
    
        e3     = W4'*delta;
        delta3 = (v3 > 0).*e3;
        
        e2     = W3'*delta3;
        delta2 = (v2 > 0).*e2;
        
        e1     = W2'*delta2;
        delta1 = (v1 > 0).*e1;
        
        dW4 = alpha*delta*y3';
        W4  = W4 + dW4;
        
        dW3 = alpha*delta3*y2';
        W3  = W3 + dW3;
        
        dW2 = alpha*delta2*y1';
        W2  = W2 + dW2;
        
        dW1 = alpha*delta1*x';
        W1  = W1 + dW1;
      end
    end
    

      ReLU定义如下:

    function y = ReLU(x)
      y = max(0, x);
    end
    

      测试代码如下:

    clear all
           
    X  = zeros(5, 5, 5);
     
    X(:, :, 1) = [ 0 1 1 0 0;
                   0 0 1 0 0;
                   0 0 1 0 0;
                   0 0 1 0 0;
                   0 1 1 1 0
                 ];
     
    X(:, :, 2) = [ 1 1 1 1 0;
                   0 0 0 0 1;
                   0 1 1 1 0;
                   1 0 0 0 0;
                   1 1 1 1 1
                 ];
     
    X(:, :, 3) = [ 1 1 1 1 0;
                   0 0 0 0 1;
                   0 1 1 1 0;
                   0 0 0 0 1;
                   1 1 1 1 0
                 ];
    
    X(:, :, 4) = [ 0 0 0 1 0;
                   0 0 1 1 0;
                   0 1 0 1 0;
                   1 1 1 1 1;
                   0 0 0 1 0
                 ];
             
    X(:, :, 5) = [ 1 1 1 1 1;
                   1 0 0 0 0;
                   1 1 1 1 0;
                   0 0 0 0 1;
                   1 1 1 1 0
                 ];
    
    D = [ 1 0 0 0 0;
          0 1 0 0 0;
          0 0 1 0 0;
          0 0 0 1 0;
          0 0 0 0 1
        ];
          
    W1 = 2*rand(20, 25) - 1;
    W2 = 2*rand(20, 20) - 1;
    W3 = 2*rand(20, 20) - 1;
    W4 = 2*rand( 5, 20) - 1;
    
    for epoch = 1:10000           % train
      [W1, W2, W3, W4] = DeepReLU(W1, W2, W3, W4, X, D);
    end
    
    N = 5;                        % inference
    for k = 1:N
      x  = reshape(X(:, :, k), 25, 1);
      v1 = W1*x;
      y1 = ReLU(v1);
      
      v2 = W2*y1;
      y2 = ReLU(v2);
      
      v3 = W3*y2;
      y3 = ReLU(v3);
      
      v  = W4*y3;
      y  = Softmax(v)
    end
    

      该代码偶尔无法完成训练并产生错误的结果,ReLU函数对初始权重更敏感。

    2.Dropout实例

      函数定义如下:

    function [W1, W2, W3, W4] = DeepDropout(W1, W2, W3, W4, X, D)
      alpha = 0.01;
      
      N = 5;  
      for k = 1:N
        x  = reshape(X(:, :, k), 25, 1);  
        v1 = W1*x;
        y1 = Sigmoid(v1);
        y1 = y1 .* Dropout(y1, 0.2); % 丢弃第一个隐含层20%的几点
        
        v2 = W2*y1;
        y2 = Sigmoid(v2);
        y2 = y2 .* Dropout(y2, 0.2);
        
        v3 = W3*y2;
        y3 = Sigmoid(v3);
        y3 = y3 .* Dropout(y3, 0.2);
       
        v  = W4*y3;
        y  = Softmax(v);
    
        d     = D(k, :)';
        
        e     = d - y;
        delta = e;
    
        e3     = W4'*delta;
        delta3 = y3.*(1-y3).*e3;
        
        e2     = W3'*delta3;
        delta2 = y2.*(1-y2).*e2;
        
        e1     = W2'*delta2;
        delta1 = y1.*(1-y1).*e1;
        
        dW4 = alpha*delta*y3';
        W4  = W4 + dW4;
        
        dW3 = alpha*delta3*y2';
        W3  = W3 + dW3;
        
        dW2 = alpha*delta2*y1';
        W2  = W2 + dW2;
        
        dW1 = alpha*delta1*x';
        W1  = W1 + dW1;
      end
    end
    

      Dropout定义如下:

    function ym = Dropout(y, ratio)
    % y是输出向量;ratio是输出向量Dropout的比例
      [m, n] = size(y);  
      ym     = zeros(m, n);
      
      num     = round(m*n*(1-ratio));
      idx     = randperm(m*n, num); % ym元素的索引
      ym(idx) = 1 / (1-ratio);
    end
    

      Sigmoid函数定义为:

    function y = Sigmoid(x)
      y = 1 ./ (1 + exp(-x));
    end

      测试代码如下:

    clear all
           
    X  = zeros(5, 5, 5);
     
    X(:, :, 1) = [ 0 1 1 0 0;
                   0 0 1 0 0;
                   0 0 1 0 0;
                   0 0 1 0 0;
                   0 1 1 1 0
                 ];
     
    X(:, :, 2) = [ 1 1 1 1 0;
                   0 0 0 0 1;
                   0 1 1 1 0;
                   1 0 0 0 0;
                   1 1 1 1 1
                 ];
     
    X(:, :, 3) = [ 1 1 1 1 0;
                   0 0 0 0 1;
                   0 1 1 1 0;
                   0 0 0 0 1;
                   1 1 1 1 0
                 ];
    
    X(:, :, 4) = [ 0 0 0 1 0;
                   0 0 1 1 0;
                   0 1 0 1 0;
                   1 1 1 1 1;
                   0 0 0 1 0
                 ];
             
    X(:, :, 5) = [ 1 1 1 1 1;
                   1 0 0 0 0;
                   1 1 1 1 0;
                   0 0 0 0 1;
                   1 1 1 1 0
                 ];
    
    D = [ 1 0 0 0 0;
          0 1 0 0 0;
          0 0 1 0 0;
          0 0 0 1 0;
          0 0 0 0 1
        ];
          
    W1 = 2*rand(20, 25) - 1;
    W2 = 2*rand(20, 20) - 1;
    W3 = 2*rand(20, 20) - 1;
    W4 = 2*rand( 5, 20) - 1;
    
    for epoch = 1:20000           % train
      [W1, W2, W3, W4] = DeepDropout(W1, W2, W3, W4, X, D);
    end
    
    N = 5;                        % inference
    for k = 1:N
      x  = reshape(X(:, :, k), 25, 1);
      v1 = W1*x;
      y1 = Sigmoid(v1);
      
      v2 = W2*y1;
      y2 = Sigmoid(v2);
      
      v3 = W3*y2;
      y3 = Sigmoid(v3);
      
      v  = W4*y3;
      y  = Softmax(v)
    end
    

      上述过程中Softmax函数为:

    function y = Softmax(x)
      ex = exp(x);
      y  = ex / sum(ex);
    end
    

      最终输出了正确的分类结果。

  • 相关阅读:
    解析3D打印切片软件:Cura
    步步为营,打造CQUI UI框架
    PHP为什么empty可以访问不存在的索引
    这是一篇关于魔法(Science)上网的教程
    【新阁教育】这样玩PLC,是不是有意思多了
    「新阁教育」西门子TIA实现BadApple完整实例
    C#数据结构-赫夫曼树
    C#数据结构-线索化二叉树
    SQL优化器-RBO与CBO分别是什么
    Linux下安装并配置VSCode(Visual Studio Code)
  • 原文地址:https://www.cnblogs.com/Negan-ZW/p/9613662.html
Copyright © 2011-2022 走看看