zoukankan      html  css  js  c++  java
  • 深层神经网络

    1.Forward and backward propagation:


    • Forward propagation for layer L:
      • Z[L]=W[L]A[L-1]+b[L]
      • A[L]=g[L](Z[L])
    • Backward propagation for layer L:
      • dZ[L]=dA[L]*g'[L](Z[L])
      • dW[L]=(1.0/m)dZ[L].A[L-1].T
      • db[L]=(1.0/m)*np.sum(dZ[L],axis=1,keepdims=True)
      • dA[L-1]=W'[L]dZ[L]

    • 核对矩阵的维数(Getting your matrix dimensions right)
      • w的维数是(下一层的维数,前一层的维数); W[L]:(n[L],n[L-1])
      • b的维度是(下一层的维数,1),b[L]:(n[L],1);
      • Z[L],A[L]的维度是(神经元个数,样本个数):(n[L],m)
      • dW[L]和W[L]维度相同,db[L]和b[L]维度相同,dZ[L]和Z[L]维度相同,dA[L]和A[L]维度相同

    2.搭建神经网络块(Building blocks of deep neural networks):


    正向函数:输入A[L-1]、W[L]、b[L],输出A[L]并且缓存Z[L]

    反向函数:输入dA[L]Z[L]W[L]、b[L],输出dA[L-1]dW[L]db[L]

    3.参数与超参数


    • 什么是超参数:
      • learning rate(学习率)
      • Iterations(梯度下降法循环的数量)
      • L(隐藏层数目)
      • n[L](隐藏层单元数目)
      • choice of activation function(激活函数的选择)
      • momentum、mini batch szie、regularization parameters 等等

    4.编程实践:


    激活函数:

    import numpy as np
    
    def sigmoid(Z):
        """
        Implements the sigmoid activation in numpy
        
        Arguments:
        Z -- numpy array of any shape
        
        Returns:
        A -- output of sigmoid(z), same shape as Z
        cache -- returns Z as well, useful during backpropagation
        """
        
        A = 1/(1+np.exp(-Z))
        cache = Z
        
        return A, cache
    
    def relu(Z):
        """
        Implement the RELU function.
    
        Arguments:
        Z -- Output of the linear layer, of any shape
    
        Returns:
        A -- Post-activation parameter, of the same shape as Z
        cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
        """
        
        A = np.maximum(0,Z)
        
        assert(A.shape == Z.shape)
        
        cache = Z 
        return A, cache
    
    
    def relu_backward(dA, cache):
        """
        Implement the backward propagation for a single RELU unit.
    
        Arguments:
        dA -- post-activation gradient, of any shape
        cache -- 'Z' where we store for computing backward propagation efficiently
    
        Returns:
        dZ -- Gradient of the cost with respect to Z
        """
        
        Z = cache
        dZ = np.array(dA, copy=True) # just converting dz to a correct object.
        
        # When z <= 0, you should set dz to 0 as well. 
        dZ[Z <= 0] = 0
        
        assert (dZ.shape == Z.shape)
        
        return dZ
    
    def sigmoid_backward(dA, cache):
        """
        Implement the backward propagation for a single SIGMOID unit.
    
        Arguments:
        dA -- post-activation gradient, of any shape
        cache -- 'Z' where we store for computing backward propagation efficiently
    
        Returns:
        dZ -- Gradient of the cost with respect to Z
        """
        
        Z = cache
        
        s = 1/(1+np.exp(-Z))
        dZ = dA * s * (1-s)
        
        assert (dZ.shape == Z.shape)
        
        return dZ
    
    def tanh_backward(dA,cache):
        Z=cache
        dZ=dA*(1-np.power(np.tanh(Z),2))
        assert(dZ.shape==Z.shape)
        return dZ

    L层神经网络的编程实现


    • Initialize Deep Neural Network parameters:
        •  1 def initialize_parameters_deep(layer_dims):
           2         """
           3     Arguments:
           4     layer_dims -- python array (list) containing the dimensions of each layer in our network
           5     
           6     Returns:
           7     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
           8                     Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
           9                     bl -- bias vector of shape (layer_dims[l], 1)
          10     """
          11     np.random.seed(3)
          12     parameters={}
          13     L=len(layer_dims)
          14     for l in range(1,L):
          15         parameters['W'+str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
          16         parameters['b'+str(l)]=np.zeros((layer_dims[l],1))
          17         assert(parameters['W'+str(l)].shape==(layer_dims[l],layer_dims[l-1]))
          18         assert(parameters['b'+str(l)].shape==(layer_dims[l],1))
          19     return parameters
    • Forward propagation module:
      • Linear Forward
      • Linear_activation_forward:
      • L-Layer Model
        •  1 def linear_forward(A,W,b):
           2         """
           3     Implement the linear part of a layer's forward propagation.
           4 
           5     Arguments:
           6     A -- activations from previous layer (or input data): (size of previous layer, number of examples)
           7     W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
           8     b -- bias vector, numpy array of shape (size of the current layer, 1)
           9 
          10     Returns:
          11     Z -- the input of the activation function, also called pre-activation parameter 
          12     cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
          13     """
          14     Z=np.dot(W,A)+b
          15     assert(Z.shape==(W.shape[0],A.shape[1])
          16     linear_cache=(A,W,b)
          17     return Z,cache
          18 
          19 def linear_activation_forward(A_prev,W,b,activation):
          20      """
          21     Implement the forward propagation for the LINEAR->ACTIVATION layer
          22 
          23     Arguments:
          24     A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
          25     W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
          26     b -- bias vector, numpy array of shape (size of the current layer, 1)
          27     activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
          28 
          29     Returns:
          30     A -- the output of the activation function, also called the post-activation value 
          31     cache -- a python dictionary containing "linear_cache" and "activation_cache";
          32              stored for computing the backward pass efficiently
          33     """
          34     if activation=='sigmoid':
          35         Z,linear_cache=linear_forward(A_prev,W,b)
          36         A,activation_cache=sigmoid(Z)
          37     elif activation=='relu':
          38         Z,linear_cache=linear_forward(A_prev,W,b)
          39         A,activation_cache=relu(Z)
          40     assert(A.shape==(W.shape[0],A_prev.shape[1]))
          41     cache=(linear_cache,activation_cache)
          42     return A,cache
          43 
          44 def L_model_forward(X,parameters):
          45         """
          46     Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
          47     
          48     Arguments:
          49     X -- data, numpy array of shape (input size, number of examples)
          50     parameters -- output of initialize_parameters_deep()
          51     
          52     Returns:
          53     AL -- last post-activation value
          54     caches -- list of caches containing:
          55                 every cache of linear_activation_forward() (there are L-1 of them, indexed from 0 to L-1)
          56     """
          57     caches=[]
          58     A=X
          59     L=len(parameters)//2
          60     for l in range(1,L):
          61         A_prev=A
          62         A,cache=linear_activation_forward(A_prev,parameters['W'+str(l)],parameters['b'+str(l)],'relu')
          63         caches.append(cache)
          64     A_prev=A
          65     AL,cache=linear_activation_forward(A_prev,parameters['W'+str(L)],parameters['b'+str(L)],"sigmoid")
          66     assert(AL.shape==(1,X.shape[1]))
          67     return AL,caches
      • Cost function:
        •  1 def comput_cost(AL,Y):
           2         """
           3     Implement the cost function defined by equation (7).
           4 
           5     Arguments:
           6     AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
           7     Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
           8 
           9     Returns:
          10     cost -- cross-entropy cost
          11     """
          12     m=Y.shape[1]
          13     cost=(-1.0/m)*(np.dot(Y,np.log(AL).T)+np.dot(1-Y,np.log(1-AL).T))
          14     cost=np.squeeze(cost)
          15     asset(cost.shape==())
          16     return cost  
      •  Backward propagation module
        • Linear backward:
          •  1 def linear_backward(dZ, cache):
             2     """
             3     Implement the linear portion of backward propagation for a single layer (layer l)
             4 
             5     Arguments:
             6     dZ -- Gradient of the cost with respect to the linear output (of current layer l)
             7     cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer
             8 
             9     Returns:
            10     dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
            11     dW -- Gradient of the cost with respect to W (current layer l), same shape as W
            12     db -- Gradient of the cost with respect to b (current layer l), same shape as b
            13     """
            14     A_prev, W, b = cache
            15     m = A_prev.shape[1]
            16 
            17     ### START CODE HERE ### (≈ 3 lines of code)
            18     dW = (1.0/m)*np.dot(dZ,A_prev.T)
            19     db = (1.0/m)*np.sum(dZ,axis=1,keepdims=True)
            20     dA_prev =np.dot(W.T,dZ)
            21     ### END CODE HERE ###
            22     
            23     assert (dA_prev.shape == A_prev.shape)
            24     assert (dW.shape == W.shape)
            25     assert (db.shape == b.shape)
            26     
            27     return dA_prev, dW, db
          • Linear-Activation backward:
            •  1 def linear_activation_backward(dA, cache, activation):
               2     """
               3     Implement the backward propagation for the LINEAR->ACTIVATION layer.
               4     
               5     Arguments:
               6     dA -- post-activation gradient for current layer l 
               7     cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
               8     activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
               9     
              10     Returns:
              11     dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
              12     dW -- Gradient of the cost with respect to W (current layer l), same shape as W
              13     db -- Gradient of the cost with respect to b (current layer l), same shape as b
              14     """
              15     linear_cache, activation_cache = cache
              16     
              17     if activation == "relu":
              18         ### START CODE HERE ### (≈ 2 lines of code)
              19         dZ = relu_backward(dA,activation_cache)
              20         dA_prev, dW, db =linear_backward(dZ,linear_cache) 
              21         ### END CODE HERE ###
              22         
              23     elif activation == "sigmoid":
              24         ### START CODE HERE ### (≈ 2 lines of code)
              25         dZ = sigmoid_backward(dA,activation_cache)
              26         dA_prev, dW, db = linear_backward(dZ,linear_cache)
              27         ### END CODE HERE ###
              28     
              29     return dA_prev, dW, db
            • L-Model Backward:
              •  1 def L_model_backward(AL, Y, caches):
                 2     """
                 3     Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
                 4     
                 5     Arguments:
                 6     AL -- probability vector, output of the forward propagation (L_model_forward())
                 7     Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
                 8     caches -- list of caches containing:
                 9                 every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                10                 the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
                11     
                12     Returns:
                13     grads -- A dictionary with the gradients
                14              grads["dA" + str(l)] = ... 
                15              grads["dW" + str(l)] = ...
                16              grads["db" + str(l)] = ... 
                17     """
                18     grads = {}
                19     L = len(caches) # the number of layers
                20     m = AL.shape[1]
                21     Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
                22     
                23     # Initializing the backpropagation
                24     ### START CODE HERE ### (1 line of code)
                25     dAL = -(np.divide(Y,AL)-np.divide(1-Y,1-AL))
                26     ### END CODE HERE ###
                27     
                28     # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"]
                29     ### START CODE HERE ### (approx. 2 lines)
                30     current_cache =caches[-1]
                31     grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] =linear_activation_backward(dAL, current_cache, "sigmoid") 
                32     ### END CODE HERE ###
                33     
                34     # Loop from l=L-2 to l=0
                35     for l in reversed(range(L-1)):
                36         # lth layer: (RELU -> LINEAR) gradients.
                37         # Inputs: "grads["dA" + str(l + 1)], current_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] 
                38         ### START CODE HERE ### (approx. 5 lines)
                39         current_cache =caches[l]
                40         dA_prev_temp, dW_temp, db_temp =linear_activation_backward(grads["dA"+str(l+1)], current_cache, "relu")
                41         grads["dA" + str(l)] = dA_prev_temp
                42         grads["dW" + str(l + 1)] = dW_temp
                43         grads["db" + str(l + 1)] = db_temp
                44         ### END CODE HERE ###
                45 
                46     return grads
        • Update Parameters
          •  1 def update_parameters(parameters, grads, learning_rate):
             2     """
             3     Update parameters using gradient descent
             4     
             5     Arguments:
             6     parameters -- python dictionary containing your parameters 
             7     grads -- python dictionary containing your gradients, output of L_model_backward
             8     
             9     Returns:
            10     parameters -- python dictionary containing your updated parameters 
            11                   parameters["W" + str(l)] = ... 
            12                   parameters["b" + str(l)] = ...
            13     """
            14     
            15     L = len(parameters) // 2 # number of layers in the neural network
            16 
            17     # Update rule for each parameter. Use a for loop.
            18     ### START CODE HERE ### (≈ 3 lines of code)
            19     for l in range(L):
            20         parameters["W" + str(l+1)] -=learning_rate*grads["dW" + str(l+1)]
            21         parameters["b" + str(l+1)] -= learning_rate*grads["db" + str(l+1)]
            22     ### END CODE HERE ###
            23     return parameters

    5.深度神经网络的运用:


    • Build Deep Learning model step:
      • Initialize parameters / Define hyperparameters
      • Loop for num_iterations:
        • Forward propagation
        • Compute cost function
        • Backward propagation
        • Update parameters (using parameters,and grads from backprop)
      • Use trained parameters to predict labels
  • 相关阅读:
    决策树和随机森林
    6个开源数据科学项目
    机器学习:梯度下降
    Python中的数据结构
    方差分析介绍(结合COVID-19案例)
    html5
    归并排序
    前端与后端
    Dw3 Sublime text 3 UltraEdit XAMMPP 火狐浏览器 ISS
    ECMAScript JavaScript JScript
  • 原文地址:https://www.cnblogs.com/easy-wang/p/10000629.html
Copyright © 2011-2022 走看看