zoukankan      html  css  js  c++  java
  • 卷积神经网络python实现

    以下实现参考吴恩达的作业。

    一、 padding

    def zero_pad(X, pad):
        """
        Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
        as illustrated in Figure 1.
        
        Argument:
        X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
        pad -- integer, amount of padding around each image on vertical and horizontal dimensions
        
        Returns:
        X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
        """
        
    
        X_pad = np.pad(X, ((0,0),(pad,pad),(pad,pad),(0,0)), 'constant', constant_values=(0,0))
    
        
        return X_pad

      从zero_pad的函数中,我们可以看出,我们只需要对原图片矩阵进行padding操作,而m是图片的个数,n_C则是channel的个数,这两个维度并不需要我们做任何操作。

    二、 卷积计算

    def conv_single_step(a_slice_prev, W, b):
        
        s = a_slice_prev * W
      
        Z = np.sum(s)
     
        Z = Z + float(b)
    
        return Z

    卷积计算的过程中,a_slice_prev是我们在图片矩阵中的窗口,而W是filter的参数。随后我们对求得的结果进行求和,然后加上常数b。

    三、 卷积forward

     1 def conv_forward(A_prev, W, b, hparameters):
     2     """
     3     Implements the forward propagation for a convolution function
     4     
     5     Arguments:
     6     A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
     7     W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
     8     b -- Biases, numpy array of shape (1, 1, 1, n_C)
     9     hparameters -- python dictionary containing "stride" and "pad"
    10         
    11     Returns:
    12     Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    13     cache -- cache of values needed for the conv_backward() function
    14     """
    15     
    16     ### START CODE HERE ###
    17     # Retrieve dimensions from A_prev's shape (≈1 line)  
    18     (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    19     
    20     # Retrieve dimensions from W's shape (≈1 line)
    21     (f, f, n_C_prev, n_C) = W.shape
    22     
    23     # Retrieve information from "hparameters" (≈2 lines)
    24     stride = hparameters['stride']
    25     pad = hparameters['pad']
    26     
    27     # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
    28     n_H = int((n_H_prev + 2 * pad - f) / stride + 1)
    29     n_W = int((n_W_prev + 2 * pad - f) / stride + 1)
    30 
    31     # Initialize the output volume Z with zeros. (≈1 line)
    32     Z = np.zeros((m, n_H, n_W, n_C))
    33     
    34     # Create A_prev_pad by padding A_prev
    35     A_prev_pad = zero_pad(A_prev, pad)
    36     
    37     for i in range(m):                               # loop over the batch of training examples
    38         a_prev_pad = A_prev_pad[i]                               # Select ith training example's padded activation
    39         for h in range(n_H):                           # loop over vertical axis of the output volume
    40             for w in range(n_W):                       # loop over horizontal axis of the output volume
    41                 for c in range(n_C):                   # loop over channels (= #filters) of the output volume
    42                     
    43                     # Find the corners of the current "slice" (≈4 lines)
    44                     vert_start = h * stride
    45                     vert_end = h * stride + f
    46                     horiz_start = w * stride
    47                     horiz_end = w * stride + f
    48                     
    49                     # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
    50                     a_slice_prev = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end]
    51                     
    52                     # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
    53                     Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])
    54                                         
    55     ### END CODE HERE ###
    56     
    57     # Making sure your output shape is correct
    58     assert(Z.shape == (m, n_H, n_W, n_C))
    59     
    60     # Save information in "cache" for the backprop
    61     cache = (A_prev, W, b, hparameters)
    62     
    63     return Z, cache

    参数中包含我们的图片A_prev,W,b以及超参数padding和strides。我们首先通过元组的方式获取了所有形状参数。根据形状对输出结果初始化。随后我们便可以对每一个图片中的每一个窗口进行遍历。通过f窗口长度的加法计算,我们得到窗口的横纵坐标位置。随后通过卷积计算得到最终结果。注意这里的参数适用于图中的每一个窗口。

    四、 池化层

    def pool_forward(A_prev, hparameters, mode = "max"):
        """
        Implements the forward pass of the pooling layer
        
        Arguments:
        A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
        hparameters -- python dictionary containing "f" and "stride"
        mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
        
        Returns:
        A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
        cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
        """
        
        # Retrieve dimensions from the input shape
        (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
        
        # Retrieve hyperparameters from "hparameters"
        f = hparameters["f"]
        stride = hparameters["stride"]
        
        # Define the dimensions of the output
        n_H = int(1 + (n_H_prev - f) / stride)
        n_W = int(1 + (n_W_prev - f) / stride)
        n_C = n_C_prev
        
        # Initialize output matrix A
        A = np.zeros((m, n_H, n_W, n_C))              
        
        ### START CODE HERE ###
        for i in range(m):                         # loop over the training examples
            for h in range(n_H):                     # loop on the vertical axis of the output volume
                for w in range(n_W):                 # loop on the horizontal axis of the output volume
                    for c in range (n_C):            # loop over the channels of the output volume
                        
                        # Find the corners of the current "slice" (≈4 lines)
                        vert_start = h * stride
                        vert_end = vert_start + f
                        horiz_start = w * stride
                        horiz_end = horiz_start + f
                        
                        # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                        a_prev_slice = A_prev[i, vert_start : vert_end, horiz_start : horiz_end, c]
                        
                        # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                        if mode == "max":
                            A[i, h, w, c] = np.max(a_prev_slice)
                        elif mode == "average":
                            A[i, h, w, c] = np.mean(a_prev_slice)
        
        ### END CODE HERE ###
        
        # Store the input and hparameters in "cache" for pool_backward()
        cache = (A_prev, hparameters)
        
        # Making sure your output shape is correct
        assert(A.shape == (m, n_H, n_W, n_C))
        
        return A, cache

    池化层的计算和之前的卷积层大同小异;我们需要注意的就是这里的参数中存在mode,其中包括max和average两种模式。

    五、 卷积层backward

    def conv_backward(dZ, cache):
        """
        Implement the backward propagation for a convolution function
        
        Arguments:
        dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
        cache -- cache of values needed for the conv_backward(), output of conv_forward()
        
        Returns:
        dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
                   numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
        dW -- gradient of the cost with respect to the weights of the conv layer (W)
              numpy array of shape (f, f, n_C_prev, n_C)
        db -- gradient of the cost with respect to the biases of the conv layer (b)
              numpy array of shape (1, 1, 1, n_C)
        """
        
        ### START CODE HERE ###
        # Retrieve information from "cache"
        (A_prev, W, b, hparameters) = cache
        
        # Retrieve dimensions from A_prev's shape
        (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
        
        # Retrieve dimensions from W's shape
        (f, f, n_C_prev, n_C) = W.shape
        
        # Retrieve information from "hparameters"
        stride = hparameters['stride']
        pad = hparameters['pad']
        
        # Retrieve dimensions from dZ's shape
        (m, n_H, n_W, n_C) = dZ.shape
        
        # Initialize dA_prev, dW, db with the correct shapes
        dA_prev = np.zeros(A_prev.shape)                           
        dW = np.zeros(W.shape)
        db = np.zeros(b.shape)
    
        # Pad A_prev and dA_prev
        A_prev_pad = zero_pad(A_prev, pad)
        dA_prev_pad = zero_pad(dA_prev, pad)
        
        for i in range(m):                       # loop over the training examples
            
            # select ith training example from A_prev_pad and dA_prev_pad
            a_prev_pad = A_prev_pad[i]
            da_prev_pad = dA_prev_pad[i]
            
            for h in range(n_H):                   # loop over vertical axis of the output volume
                for w in range(n_W):               # loop over horizontal axis of the output volume
                    for c in range(n_C):           # loop over the channels of the output volume
                        
                        # Find the corners of the current "slice"
                        vert_start = h * stride
                        vert_end = h * stride + f
                        horiz_start = w * stride
                        horiz_end = w * stride + f
                        
                        # Use the corners to define the slice from a_prev_pad
                        a_slice = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end, : ]
    
                        # Update gradients for the window and the filter's parameters using the code formulas given above
                        da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[ i, h, w ,c]
    
                        dW[:,:,:,c] += a_slice * dZ[ i, h, w ,c]
                        db[:,:,:,c] += dZ[ i, h, w ,c]
                        
            # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
            dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
        ### END CODE HERE ###
        
        # Making sure your output shape is correct
        assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
        
        return dA_prev, dW, db

    这里对于dW,db的计算与BP神经网络的计算相似。在更新参数时,我们对整个图片所有位置进行遍历,进行一次计算。

    六、池化层backward

    我们了解池化层的原理之后,就需要根据其特征构造backward,对于max池,我们需要创建一个mask来获得我们的有效窗口。

    def create_mask_from_window(x):
        """
        Creates a mask from an input matrix x, to identify the max entry of x.
        
        Arguments:
        x -- Array of shape (f, f)
        
        Returns:
        mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
        """
        
        ### START CODE HERE ### (≈1 line)
        mask = (x == np.max(x))
        ### END CODE HERE ###
        
        return mask

    对于average我们需要分配到窗口中的每个值。

    def distribute_value(dz, shape):
        """
        Distributes the input value in the matrix of dimension shape
        
        Arguments:
        dz -- input scalar
        shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
        
        Returns:
        a -- Array of size (n_H, n_W) for which we distributed the value of dz
        """
        
        ### START CODE HERE ###
        # Retrieve dimensions from shape (≈1 line)
        (n_H, n_W) = shape
        
        # Compute the value to distribute on the matrix (≈1 line)
        average = n_H * n_W
        
        # Create a matrix where every entry is the "average" value (≈1 line)
        a = dz / average * np.ones((n_H, n_W))
        ### END CODE HERE ###
        
        return a

    之后我们便可以通过和卷积层backward相同的方法,对图片进行遍历,我们将每一次得到的有效输出dZ进行累加得到这一层的dZ。

    def pool_backward(dA, cache, mode = "max"):
        """
        Implements the backward pass of the pooling layer
        
        Arguments:
        dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
        cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters 
        mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
        
        Returns:
        dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
        """
        
        ### START CODE HERE ###
        
        # Retrieve information from cache (≈1 line)
        (A_prev, hparameters) = cache
        
        # Retrieve hyperparameters from "hparameters" (≈2 lines)
        stride = hparameters['stride']
        f = hparameters['f']
        
        # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
        m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
        m, n_H, n_W, n_C = dA.shape
        
        # Initialize dA_prev with zeros (≈1 line)
        dA_prev = np.zeros(A_prev.shape)
        
        for i in range(m):                       # loop over the training examples
            
            # select training example from A_prev (≈1 line)
            a_prev = A_prev[i]
            
            for h in range(n_H):                   # loop on the vertical axis
                for w in range(n_W):               # loop on the horizontal axis
                    for c in range(n_C):           # loop over the channels (depth)
                        
                        # Find the corners of the current "slice" (≈4 lines)
                        vert_start = h * stride
                        vert_end = vert_start + f
                        horiz_start = w * stride
                        horiz_end = horiz_start + f
                        
                        # Compute the backward propagation in both modes.
                        if mode == "max":
                            
                            # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                            a_prev_slice = a_prev[vert_start : vert_end, horiz_start : horiz_end, c]
                            # Create the mask from a_prev_slice (≈1 line)
                            mask = create_mask_from_window(a_prev_slice)
                            # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                            dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i, h, w, c]
                            
                        elif mode == "average":
                            
                            # Get the value a from dA (≈1 line)
                            da = dA[i, h, w, c]
                            # Define the shape of the filter as fxf (≈1 line)
                            shape = (f, f)
                            # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                            dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)
                            
        ### END CODE ###
        
        # Making sure your output shape is correct
        assert(dA_prev.shape == A_prev.shape)
        
        return dA_prev
  • 相关阅读:
    [LC] 347. Top K Frequent Elements
    [LC] 659. Split Array into Consecutive Subsequences
    [LC] 430. Flatten a Multilevel Doubly Linked List
    [LC] 271. Encode and Decode Strings
    [LC] 373. Find K Pairs with Smallest Sums
    [LC] 1048. Longest String Chain
    [LC] 297. Serialize and Deserialize Binary Tree
    ubuntu 创建 PyCharm 桌面快捷方式 (或者叫 启动器 )
    scala private 和 private[this] 的区别
    %s %r 区别 转
  • 原文地址:https://www.cnblogs.com/siyuan-Jin/p/12392054.html
Copyright © 2011-2022 走看看