zoukankan      html  css  js  c++  java
  • 改善深层神经网络

    来自吴恩达深度学习视频改善深层神经网络 - 第二周作业 Optimization+Methods。如果直接看代码对你有困难的话,参见:https://blog.csdn.net/u013733326/article/details/79907419
    本文写法与参照稍有不同,改正了其一些错误。
    这次作业实现了普通的梯度下降,动量梯度下降和Adam优化算法(可以参考博主之前的博文),并进行了准确度对比。
    https://github.com/Hongze-Wang/Deep-Learning-Andrew-Ng/tree/master/homework 戳这里看完整版

    Optimization Methods

    1 - Gradient Descent

    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.io
    import math
    import sklearn
    import sklearn.datasets
    
    from opt_utils import load_params_and_grads, initialize_parameters, forward_propagation, backward_propagation
    from opt_utils import compute_cost, predict, predict_dec, plot_decision_boundary, load_dataset
    from testCases import *
    
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    
    C:Userswanghopt_utils.py:76: SyntaxWarning: assertion is always true, perhaps remove parentheses?
      assert(parameters['W' + str(l)].shape == layer_dims[l], layer_dims[l-1])
    C:Userswanghopt_utils.py:77: SyntaxWarning: assertion is always true, perhaps remove parentheses?
      assert(parameters['W' + str(l)].shape == layer_dims[l], 1)
    
    # GRADED FUNCTION: update_parameters_with_gd
    
    def update_parameters_with_gd(parameters, grads, learning_rate):
        """
        Update parameters using one step of gradient descent
        
        Arguments:
        parameters -- python dictionary containing your parameters to be updated:
                        parameters['W' + str(l)] = Wl
                        parameters['b' + str(l)] = bl
        grads -- python dictionary containing your gradients to update each parameters:
                        grads['dW' + str(l)] = dWl
                        grads['db' + str(l)] = dbl
        learning_rate -- the learning rate, scalar.
        
        Returns:
        parameters -- python dictionary containing your updated parameters 
        """
    
        L = len(parameters) // 2 # number of layers in the neural networks
    
        # Update rule for each parameter
        for l in range(L):
            ### START CODE HERE ### (approx. 2 lines)
            parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
            parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]
            ### END CODE HERE ###
            
        return parameters
    
    parameters, grads, learning_rate = update_parameters_with_gd_test_case()
    
    parameters = update_parameters_with_gd(parameters, grads, learning_rate)
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))
    
    W1 = [[ 1.63535156 -0.62320365 -0.53718766]
     [-1.07799357  0.85639907 -2.29470142]]
    b1 = [[ 1.74604067]
     [-0.75184921]]
    W2 = [[ 0.32171798 -0.25467393  1.46902454]
     [-2.05617317 -0.31554548 -0.3756023 ]
     [ 1.1404819  -1.09976462 -0.1612551 ]]
    b2 = [[-0.88020257]
     [ 0.02561572]
     [ 0.57539477]]
    
    # 以下代码是给你作参考的 不参与本次作业
    '''
    A variant of this is Stochastic Gradient Descent (SGD), which is equivalent to mini-batch gradient descent where each mini-batch has just 1 example. The update rule that you have just implemented does not change. What changes is that you would be computing gradients on just one training example at a time, rather than on the whole training set. The code examples below illustrate the difference between stochastic gradient descent and (batch) gradient descent.
    '''
    # (Batch) Gradient Descent:
    X = data_input
    Y = labels
    parameters = initialize_parameters(layers_dims)
    for i in range(0, num_iterations):
        # Forward propagation
        a, caches = forward_propagation(X, parameters)
        # Compute cost.
        cost = compute_cost(a, Y)
        # Backward propagation.
        grads = backward_propagation(a, caches, parameters)
        # Update parameters.
        parameters = update_parameters(parameters, grads)
    # Stochastic Gradient Descent:
    X = data_input
    Y = labels
    parameters = initialize_parameters(layers_dims)
    for i in range(0, num_iterations):
        for j in range(0, m):
            # Forward propagation
            a, caches = forward_propagation(X[:,j], parameters)
            # Compute cost
            cost = compute_cost(a, Y[:,j])
            # Backward propagation
            grads = backward_propagation(a, caches, parameters)
            # Update parameters.
            parameters = update_parameters(parameters, grads)
    

    2 - Mini-Batch Gradient descent

    # GRADED FUNCTION: random_mini_batches
    
    def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
        """
        Creates a list of random minibatches from (X, Y)
        
        Arguments:
        X -- input data, of shape (input size, number of examples)
        Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
        mini_batch_size -- size of the mini-batches, integer
        
        Returns:
        mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
        """
        
        np.random.seed(seed)            # To make your "random" minibatches the same as ours
        m = X.shape[1]                  # number of training examples
        mini_batches = []
            
        # Step 1: Shuffle (X, Y)
        permutation = list(np.random.permutation(m))
        shuffled_X = X[:, permutation]
        shuffled_Y = Y[:, permutation].reshape((1,m))
    
        # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
        num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
        for k in range(0, num_complete_minibatches):
            ### START CODE HERE ### (approx. 2 lines)
            mini_batch_X = shuffled_X[:, k*mini_batch_size: (k+1)*mini_batch_size]
            mini_batch_Y = shuffled_Y[:, k*mini_batch_size: (k+1)*mini_batch_size]
            ### END CODE HERE ###
            mini_batch = (mini_batch_X, mini_batch_Y)
            mini_batches.append(mini_batch)
        
        # Handling the end case (last mini-batch < mini_batch_size)
        if m % mini_batch_size != 0:
            ### START CODE HERE ### (approx. 2 lines)
            mini_batch_X = shuffled_X[:, mini_batch_size * num_complete_minibatches:]
            mini_batch_Y = shuffled_Y[:, mini_batch_size * num_complete_minibatches:]
            ### END CODE HERE ###
            mini_batch = (mini_batch_X, mini_batch_Y)
            mini_batches.append(mini_batch)
        
        return mini_batches
    
    X_assess, Y_assess, mini_batch_size = random_mini_batches_test_case()
    mini_batches = random_mini_batches(X_assess, Y_assess, mini_batch_size)
    
    print ("shape of the 1st mini_batch_X: " + str(mini_batches[0][0].shape))
    print ("shape of the 2nd mini_batch_X: " + str(mini_batches[1][0].shape))
    print ("shape of the 3rd mini_batch_X: " + str(mini_batches[2][0].shape))
    print ("shape of the 1st mini_batch_Y: " + str(mini_batches[0][1].shape))
    print ("shape of the 2nd mini_batch_Y: " + str(mini_batches[1][1].shape)) 
    print ("shape of the 3rd mini_batch_Y: " + str(mini_batches[2][1].shape))
    print ("mini batch sanity check: " + str(mini_batches[0][0][0][0:3]))
    
    shape of the 1st mini_batch_X: (12288, 64)
    shape of the 2nd mini_batch_X: (12288, 64)
    shape of the 3rd mini_batch_X: (12288, 20)
    shape of the 1st mini_batch_Y: (1, 64)
    shape of the 2nd mini_batch_Y: (1, 64)
    shape of the 3rd mini_batch_Y: (1, 20)
    mini batch sanity check: [ 0.90085595 -0.7612069   0.2344157 ]
    

    3 - Momentum

    # GRADED FUNCTION: initialize_velocity
    
    def initialize_velocity(parameters):
        """
        Initializes the velocity as a python dictionary with:
                    - keys: "dW1", "db1", ..., "dWL", "dbL" 
                    - values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.
        Arguments:
        parameters -- python dictionary containing your parameters.
                        parameters['W' + str(l)] = Wl
                        parameters['b' + str(l)] = bl
        
        Returns:
        v -- python dictionary containing the current velocity.
                        v['dW' + str(l)] = velocity of dWl
                        v['db' + str(l)] = velocity of dbl
        """
        
        L = len(parameters) // 2 # number of layers in the neural networks
        v = {}
        
        # Initialize velocity
        for l in range(L):
            ### START CODE HERE ### (approx. 2 lines)
            v["dW" + str(l+1)] = np.zeros_like(parameters["W" + str(l+1)])
            v["db" + str(l+1)] = np.zeros_like(parameters["b" + str(l+1)])
            ### END CODE HERE ###
            
        return v
    
    parameters = initialize_velocity_test_case()
    
    v = initialize_velocity(parameters)
    print("v["dW1"] = " + str(v["dW1"]))
    print("v["db1"] = " + str(v["db1"]))
    print("v["dW2"] = " + str(v["dW2"]))
    print("v["db2"] = " + str(v["db2"]))
    
    v["dW1"] = [[0. 0. 0.]
     [0. 0. 0.]]
    v["db1"] = [[0.]
     [0.]]
    v["dW2"] = [[0. 0. 0.]
     [0. 0. 0.]
     [0. 0. 0.]]
    v["db2"] = [[0.]
     [0.]
     [0.]]
    
    # GRADED FUNCTION: update_parameters_with_momentum
    
    def update_parameters_with_momentum(parameters, grads, v, beta, learning_rate):
        """
        Update parameters using Momentum
        
        Arguments:
        parameters -- python dictionary containing your parameters:
                        parameters['W' + str(l)] = Wl
                        parameters['b' + str(l)] = bl
        grads -- python dictionary containing your gradients for each parameters:
                        grads['dW' + str(l)] = dWl
                        grads['db' + str(l)] = dbl
        v -- python dictionary containing the current velocity:
                        v['dW' + str(l)] = ...
                        v['db' + str(l)] = ...
        beta -- the momentum hyperparameter, scalar
        learning_rate -- the learning rate, scalar
        
        Returns:
        parameters -- python dictionary containing your updated parameters 
        v -- python dictionary containing your updated velocities
        """
    
        L = len(parameters) // 2 # number of layers in the neural networks
        
        # Momentum update for each parameter
        for l in range(L):
            
            ### START CODE HERE ### (approx. 4 lines)
            # compute velocities
            v["dW" + str(l+1)] = beta * v["dW" + str(l+1)] + (1-beta) * grads["dW" + str(l+1)]
            v["db" + str(l+1)] = beta * v["db" + str(l+1)] + (1-beta) * grads["db" + str(l+1)]
            # update parameters
            parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * v["dW" + str(l+1)]
            parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * v["db" + str(l+1)]
            ### END CODE HERE ###
            
        return parameters, v
    
    parameters, grads, v = update_parameters_with_momentum_test_case()
    
    parameters, v = update_parameters_with_momentum(parameters, grads, v, beta = 0.9, learning_rate = 0.01)
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))
    print("v["dW1"] = " + str(v["dW1"]))
    print("v["db1"] = " + str(v["db1"]))
    print("v["dW2"] = " + str(v["dW2"]))
    print("v["db2"] = " + str(v["db2"]))
    
    W1 = [[ 1.62544598 -0.61290114 -0.52907334]
     [-1.07347112  0.86450677 -2.30085497]]
    b1 = [[ 1.74493465]
     [-0.76027113]]
    W2 = [[ 0.31930698 -0.24990073  1.4627996 ]
     [-2.05974396 -0.32173003 -0.38320915]
     [ 1.13444069 -1.0998786  -0.1713109 ]]
    b2 = [[-0.87809283]
     [ 0.04055394]
     [ 0.58207317]]
    v["dW1"] = [[-0.11006192  0.11447237  0.09015907]
     [ 0.05024943  0.09008559 -0.06837279]]
    v["db1"] = [[-0.01228902]
     [-0.09357694]]
    v["dW2"] = [[-0.02678881  0.05303555 -0.06916608]
     [-0.03967535 -0.06871727 -0.08452056]
     [-0.06712461 -0.00126646 -0.11173103]]
    v["db2"] = [[0.02344157]
     [0.16598022]
     [0.07420442]]
    

    4 - Adam

    # GRADED FUNCTION: initialize_adam
    
    def initialize_adam(parameters) :
        """
        Initializes v and s as two python dictionaries with:
                    - keys: "dW1", "db1", ..., "dWL", "dbL" 
                    - values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.
        
        Arguments:
        parameters -- python dictionary containing your parameters.
                        parameters["W" + str(l)] = Wl
                        parameters["b" + str(l)] = bl
        
        Returns: 
        v -- python dictionary that will contain the exponentially weighted average of the gradient.
                        v["dW" + str(l)] = ...
                        v["db" + str(l)] = ...
        s -- python dictionary that will contain the exponentially weighted average of the squared gradient.
                        s["dW" + str(l)] = ...
                        s["db" + str(l)] = ...
    
        """
        
        L = len(parameters) // 2 # number of layers in the neural networks
        v = {}
        s = {}
        
        # Initialize v, s. Input: "parameters". Outputs: "v, s".
        for l in range(L):
        ### START CODE HERE ### (approx. 4 lines)
            v["dW" + str(l+1)] = np.zeros_like(parameters["W" + str(l+1)])
            v["db" + str(l+1)] = np.zeros_like(parameters["b" + str(l+1)])
            s["dW" + str(l+1)] = np.zeros_like(parameters["W" + str(l+1)])
            s["db" + str(l+1)] = np.zeros_like(parameters["b" + str(l+1)])
        ### END CODE HERE ###
        
        return v, s
    
    parameters = initialize_adam_test_case()
    
    v, s = initialize_adam(parameters)
    print("v["dW1"] = " + str(v["dW1"]))
    print("v["db1"] = " + str(v["db1"]))
    print("v["dW2"] = " + str(v["dW2"]))
    print("v["db2"] = " + str(v["db2"]))
    print("s["dW1"] = " + str(s["dW1"]))
    print("s["db1"] = " + str(s["db1"]))
    print("s["dW2"] = " + str(s["dW2"]))
    print("s["db2"] = " + str(s["db2"]))
    
    v["dW1"] = [[0. 0. 0.]
     [0. 0. 0.]]
    v["db1"] = [[0.]
     [0.]]
    v["dW2"] = [[0. 0. 0.]
     [0. 0. 0.]
     [0. 0. 0.]]
    v["db2"] = [[0.]
     [0.]
     [0.]]
    s["dW1"] = [[0. 0. 0.]
     [0. 0. 0.]]
    s["db1"] = [[0.]
     [0.]]
    s["dW2"] = [[0. 0. 0.]
     [0. 0. 0.]
     [0. 0. 0.]]
    s["db2"] = [[0.]
     [0.]
     [0.]]
    
    # GRADED FUNCTION: update_parameters_with_adam
    
    def update_parameters_with_adam(parameters, grads, v, s, t, learning_rate = 0.01,
                                    beta1 = 0.9, beta2 = 0.999,  epsilon = 1e-8):
        """
        Update parameters using Adam
        
        Arguments:
        parameters -- python dictionary containing your parameters:
                        parameters['W' + str(l)] = Wl
                        parameters['b' + str(l)] = bl
        grads -- python dictionary containing your gradients for each parameters:
                        grads['dW' + str(l)] = dWl
                        grads['db' + str(l)] = dbl
        v -- Adam variable, moving average of the first gradient, python dictionary
        s -- Adam variable, moving average of the squared gradient, python dictionary
        learning_rate -- the learning rate, scalar.
        beta1 -- Exponential decay hyperparameter for the first moment estimates 
        beta2 -- Exponential decay hyperparameter for the second moment estimates 
        epsilon -- hyperparameter preventing division by zero in Adam updates
    
        Returns:
        parameters -- python dictionary containing your updated parameters 
        v -- Adam variable, moving average of the first gradient, python dictionary
        s -- Adam variable, moving average of the squared gradient, python dictionary
        """
        
        L = len(parameters) // 2                 # number of layers in the neural networks
        v_corrected = {}                         # Initializing first moment estimate, python dictionary
        s_corrected = {}                         # Initializing second moment estimate, python dictionary
        
        # Perform Adam update on all parameters
        for l in range(L):
            # Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v".
            ### START CODE HERE ### (approx. 2 lines)
            v["dW" + str(l+1)] = beta1 * v["dW" + str(l+1)] + (1-beta1) * grads["dW" + str(l+1)]
            v["db" + str(l+1)] = beta1 * v["db" + str(l+1)] + (1-beta1) * grads["db" + str(l+1)]
            ### END CODE HERE ###
    
            # Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected".
            ### START CODE HERE ### (approx. 2 lines)
            v_corrected["dW" + str(l+1)] = v["dW" + str(l+1)] / (1 - np.power(beta1, t))
            v_corrected["db" + str(l+1)] = v["db" + str(l+1)] / (1 - np.power(beta1, t))
            ### END CODE HERE ###
    
            # Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s".
            ### START CODE HERE ### (approx. 2 lines)
            s["dW" + str(l+1)] = beta2 * s["dW" + str(l+1)] + (1-beta2) * np.square(grads["dW" + str(l+1)])
            s["db" + str(l+1)] = beta2 * s["db" + str(l+1)] + (1-beta2) * np.square(grads["db" + str(l+1)])
            ### END CODE HERE ###
    
            # Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected".
            ### START CODE HERE ### (approx. 2 lines)
            s_corrected["dW" + str(l+1)] = s["dW" + str(l+1)] / (1 - np.power(beta2, t))
            s_corrected["db" + str(l+1)] = s["db" + str(l+1)] / (1 - np.power(beta2, t))
            ### END CODE HERE ###
    
            # Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters".
            ### START CODE HERE ### (approx. 2 lines)
            parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * v_corrected["dW" + str(l+1)] / np.sqrt(s_corrected["dW" + str(l+1)] + epsilon)
            parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * v_corrected["db" + str(l+1)] / np.sqrt(s_corrected["db" + str(l+1)] + epsilon)
            ### END CODE HERE ###
    
        return parameters, v, s
    
    parameters, grads, v, s = update_parameters_with_adam_test_case()
    parameters, v, s  = update_parameters_with_adam(parameters, grads, v, s, t = 2)
    
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))
    print("v["dW1"] = " + str(v["dW1"]))
    print("v["db1"] = " + str(v["db1"]))
    print("v["dW2"] = " + str(v["dW2"]))
    print("v["db2"] = " + str(v["db2"]))
    print("s["dW1"] = " + str(s["dW1"]))
    print("s["db1"] = " + str(s["db1"]))
    print("s["dW2"] = " + str(s["dW2"]))
    print("s["db2"] = " + str(s["db2"]))
    
    W1 = [[ 1.63178673 -0.61919778 -0.53561312]
     [-1.08040999  0.85796626 -2.29409733]]
    b1 = [[ 1.75225313]
     [-0.75376553]]
    W2 = [[ 0.32648046 -0.25681174  1.46954931]
     [-2.05269934 -0.31497584 -0.37661299]
     [ 1.14121081 -1.09245036 -0.16498684]]
    b2 = [[-0.88529978]
     [ 0.03477238]
     [ 0.57537385]]
    v["dW1"] = [[-0.11006192  0.11447237  0.09015907]
     [ 0.05024943  0.09008559 -0.06837279]]
    v["db1"] = [[-0.01228902]
     [-0.09357694]]
    v["dW2"] = [[-0.02678881  0.05303555 -0.06916608]
     [-0.03967535 -0.06871727 -0.08452056]
     [-0.06712461 -0.00126646 -0.11173103]]
    v["db2"] = [[0.02344157]
     [0.16598022]
     [0.07420442]]
    s["dW1"] = [[0.00121136 0.00131039 0.00081287]
     [0.0002525  0.00081154 0.00046748]]
    s["db1"] = [[1.51020075e-05]
     [8.75664434e-04]]
    s["dW2"] = [[7.17640232e-05 2.81276921e-04 4.78394595e-04]
     [1.57413361e-04 4.72206320e-04 7.14372576e-04]
     [4.50571368e-04 1.60392066e-07 1.24838242e-03]]
    s["db2"] = [[5.49507194e-05]
     [2.75494327e-03]
     [5.50629536e-04]]
    

    5 - Model with different optimization algorithms

    train_X, train_Y = load_dataset()
    

    在这里插入图片描述

    def model(X, Y, layers_dims, optimizer, learning_rate = 0.0007, mini_batch_size = 64, beta = 0.9,
              beta1 = 0.9, beta2 = 0.999,  epsilon = 1e-8, num_epochs = 10000, print_cost = True):
        """
        3-layer neural network model which can be run in different optimizer modes.
        
        Arguments:
        X -- input data, of shape (2, number of examples)
        Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
        layers_dims -- python list, containing the size of each layer
        learning_rate -- the learning rate, scalar.
        mini_batch_size -- the size of a mini batch
        beta -- Momentum hyperparameter
        beta1 -- Exponential decay hyperparameter for the past gradients estimates 
        beta2 -- Exponential decay hyperparameter for the past squared gradients estimates 
        epsilon -- hyperparameter preventing division by zero in Adam updates
        num_epochs -- number of epochs
        print_cost -- True to print the cost every 1000 epochs
    
        Returns:
        parameters -- python dictionary containing your updated parameters 
        """
    
        L = len(layers_dims)             # number of layers in the neural networks
        costs = []                       # to keep track of the cost
        t = 0                            # initializing the counter required for Adam update
        seed = 10                        # For grading purposes, so that your "random" minibatches are the same as ours
        
        # Initialize parameters
        parameters = initialize_parameters(layers_dims)
    
        # Initialize the optimizer
        if optimizer == "gd":
            pass # no initialization required for gradient descent
        elif optimizer == "momentum":
            v = initialize_velocity(parameters)
        elif optimizer == "adam":
            v, s = initialize_adam(parameters)
        
        # Optimization loop
        for i in range(num_epochs):
            
            # Define the random minibatches. We increment the seed to reshuffle differently the dataset after each epoch
            seed = seed + 1
            minibatches = random_mini_batches(X, Y, mini_batch_size, seed)
    
            for minibatch in minibatches:
    
                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
    
                # Forward propagation
                a3, caches = forward_propagation(minibatch_X, parameters)
    
                # Compute cost
                cost = compute_cost(a3, minibatch_Y)
    
                # Backward propagation
                grads = backward_propagation(minibatch_X, minibatch_Y, caches)
    
                # Update parameters
                if optimizer == "gd":
                    parameters = update_parameters_with_gd(parameters, grads, learning_rate)
                elif optimizer == "momentum":
                    parameters, v = update_parameters_with_momentum(parameters, grads, v, beta, learning_rate)
                elif optimizer == "adam":
                    t = t + 1 # Adam counter
                    parameters, v, s = update_parameters_with_adam(parameters, grads, v, s,
                                                                   t, learning_rate, beta1, beta2,  epsilon)
            
            # Print the cost every 1000 epoch
            if print_cost and i % 1000 == 0:
                print ("Cost after epoch %i: %f" %(i, cost))
            if print_cost and i % 100 == 0:
                costs.append(cost)
                    
        # plot the cost
        plt.plot(costs)
        plt.ylabel('cost')
        plt.xlabel('epochs (per 100)')
        plt.title("Learning rate = " + str(learning_rate))
        plt.show()
    
        return parameters
    

    5.1 - Mini-batch Gradient descent

    # train 3-layer model
    layers_dims = [train_X.shape[0], 5, 2, 1]
    parameters = model(train_X, train_Y, layers_dims, optimizer = "gd")
    
    # Predict
    predictions = predict(train_X, train_Y, parameters)
    
    # Plot decision boundary
    plt.title("Model with Gradient Descent optimization")
    axes = plt.gca()
    axes.set_xlim([-1.5,2.5])
    axes.set_ylim([-1,1.5])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, np.squeeze(train_Y))
    
    Cost after epoch 0: 0.690736
    Cost after epoch 1000: 0.685273
    Cost after epoch 2000: 0.647072
    Cost after epoch 3000: 0.619525
    Cost after epoch 4000: 0.576584
    Cost after epoch 5000: 0.607243
    Cost after epoch 6000: 0.529403
    Cost after epoch 7000: 0.460768
    Cost after epoch 8000: 0.465586
    Cost after epoch 9000: 0.464518
    

    在这里插入图片描述

    5.2 - Mini-batch gradient descent with momentum

    # train 3-layer model
    layers_dims = [train_X.shape[0], 5, 2, 1]
    parameters = model(train_X, train_Y, layers_dims, beta = 0.9, optimizer = "momentum")
    
    # Predict
    predictions = predict(train_X, train_Y, parameters)
    
    # Plot decision boundary
    plt.title("Model with Momentum optimization")
    axes = plt.gca()
    axes.set_xlim([-1.5,2.5])
    axes.set_ylim([-1,1.5])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, np.squeeze(train_Y))
    
    Cost after epoch 0: 0.690741
    Cost after epoch 1000: 0.685341
    Cost after epoch 2000: 0.647145
    Cost after epoch 3000: 0.619594
    Cost after epoch 4000: 0.576665
    Cost after epoch 5000: 0.607324
    Cost after epoch 6000: 0.529476
    Cost after epoch 7000: 0.460936
    Cost after epoch 8000: 0.465780
    Cost after epoch 9000: 0.464740
    

    在这里插入图片描述

    5.3 - Mini-batch with Adam mode

    # train 3-layer model
    layers_dims = [train_X.shape[0], 5, 2, 1]
    parameters = model(train_X, train_Y, layers_dims, optimizer = "adam")
    
    # Predict
    predictions = predict(train_X, train_Y, parameters)
    
    # Plot decision boundary
    plt.title("Model with Adam optimization")
    axes = plt.gca()
    axes.set_xlim([-1.5,2.5])
    axes.set_ylim([-1,1.5])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, np.squeeze(train_Y))
    
    Cost after epoch 0: 0.690552
    Cost after epoch 1000: 0.185501
    Cost after epoch 2000: 0.150830
    Cost after epoch 3000: 0.074454
    Cost after epoch 4000: 0.125959
    Cost after epoch 5000: 0.104344
    Cost after epoch 6000: 0.100676
    Cost after epoch 7000: 0.031652
    Cost after epoch 8000: 0.111973
    Cost after epoch 9000: 0.197940
    

    在这里插入图片描述

    methodaccuracy
    Gradient descent79.7% (oscillations)
    Momentum79.7% (oscillations)
    Adam94% (smoother)

    注意到Gradient descent和Momentum准确度一致,凸显了Adam的优越。

  • 相关阅读:
    Orcad Pspice仿真
    AD导入Allegro brd文件(导入后找不到PCB的解决方法)
    VJTAG转VME DTB
    win10 非Unicode应用程序显示设置
    MFC多文档视图编程总结
    VC MFC开发示例下载
    FPGA仿真及时序约束分析
    VMWARE Thin APP
    VPX技术基础概论
    SecureCRT脚本(VBS)运行
  • 原文地址:https://www.cnblogs.com/wanghongze95/p/13842531.html
Copyright © 2011-2022 走看看