  • 神经网络




    • 神经网络是由具有适应性的简单单元组成的广泛并行互连的网络

    • Perceptron 感知机 :

    • 反向传播算法(Back propagation)可以应用于多层前馈神经网络,还可以应用于训练递归神经网络
      一般说 BP算法就是训练的多层前馈神经网络.

    • 深度学习的基本名词
      1.卷积神经网络(convolutional neural network CNN) cnn复合多个 卷积层 和 采样层 来对输入信号进行加工.最终在连接层实现与输出目标之间的映射.
      3.采样层:基于局部相关性原理进行亚采样,减少数据量的同时保留有用信息.换个角度理解就是 用机器代替原来专家的"特征工程(feature engineering)"

    • 神经网络的激活函数:
      1.logitic:典型的激活函数sigmod函数,在计算分类概率时,非常有用.$$f(z)=frac{1}{1+exp(-z)} , 0<f(z)<1$$
      2.Tanh: $$f(z)=tanh(z)=frac{e{z}-e{-z}}{e{z}+e{-z}} ,-1<f(z)<1$$

    • 卷积神经网络(CNN):
      卷积:就是两个操作在时间维度上的融合.$$(fcdot g)( au)=int_{-infty}^{infty}f( au)g(t- au)d au$$
      卷积的使用范围可以被延展到离散域,数学表达式为$$(fcdot g)left [ n ight ]=sum_{m=-infty}^{infty} f(m)g(n-m)$$


    • 1.根据训练数据集来调整神经元之间的连接权 connection weight ,以及每个功能神经元的阈值.也就是说,神经网络所学到的东西都在连接权和阈值中.
    • 2.参数的确定(利用迭代更新)调整感知机(神经网络)的权重.$ omega_{i}leftarrow omega+Delta omega_{i}( )Delta omega_{i}=eta(y-hat{y}x_{i})$
    • 3.先将输入事例提供给输入层神经元,逐层将信号进行前传,直到产生输出层的结果
    • 4.计算输出层的误差,再将误差逆向传播至隐层神经元
    • 5.最后根据隐层神经元的误差来对连接权和阈值进行调整.并进行迭代循环进行.


    • BP算法:
      训练集$D = {(x_{1},y_{1}),(x_{2},y_{2}),...,(x_{m},y_{m})} $
      输出:l维实值向量 阈值( heta_{j})
      阈值 (gamma_{h}) (b_{h}=f_{1}(alpha_{h}-gamma_{h}))(y_{j}=f_{2}(eta_{j}- heta_{j}))

    [upsilon leftarrow upsilon +Delta upsilon$$ <br>BP算法基于梯度下降策略来进行参数的调整 - 知识点补充:<br>梯度下降法(gradient descent)<br>梯度下降法是一种常用的一阶优化方法,是求解无约束优化问题最简单,最经典的方法之一.<br>f(x)是连续可微函数,且满足$$f(x^{t+1})<f(x^{t}) t=0,1,2,3...$$<br>则不断执行该过程可收敛到局部最小点,根据泰勒公式展开$$f(x+Delta x)simeq f(x)+Delta x^{T} riangledown f(x)$$<br>为了使$f(x+Delta x)<f(x)$ 可以让$$Delta x=-gamma riangledown f(x), 其中 gamma为步长,一个小常数$$<br>目标函数:$E_{k}=frac{1}{2}sum_{j=1}^{l}(hat{y_{j}^{k}}-y_{j}^{K})$最小化目标函数<br>推导$Delta upsilon_{ih}$的更新公式:<br>对目标函数进行求导 $$frac{partial E_{k}}{partial upsilon_{ih}}=frac{partial E_{k}}{partial b_{h}}. frac{partial b_{h}}{partial alpha_{h} }=-sum_{j=1}^{l} frac{partial E_{k}}{partial eta{j}}.frac{partial eta{j}}{partial alpha_{h}}{f}'(alpha_{h}-gamma_{h})=sum_{i=1}^{l} omega_{hj}g_{j}{f}'(alpha_{h}-gamma_{h})=b_{h}(1-b_{h})sum_{j=1}^{l}omega_{hj}g_{j}.]


    • 全局最小 & 局部最小


    • BP算法,在西瓜数据集3.0上用算法训练一个单隐层神经网络


              3. for all (Xk,Yk) do
              4.      根据当前参数和公式,计算当前样本的输出
              5.      根据公式计算出输出层神经元的梯度项
              6.      根据公式计算隐层神经元的梯度项
              7.      根据公式更新连接权和阈值
              8.  end for
              9. until 达到停止条件
    注意区分标准BP算法,和累积BP算法(accumulated error backpropagation)
    # input()函数
    # 将西瓜数据集3.0进行读取
    def input():
        @param  : none or filepath
        @return : dataSet,dataFrame using pandas
                  Random double or random.uniform()
            import pandas as pd
        except ImportError:
            print("module import error")
        with open('/home/dengshuo/GithubCode/ML/CH05/watermelon3.csv') as data_file:
        return df
    # learningRatio()函数
    # 初始化函数的学习率
    def learningRatio():
        @ return : learningRatio 
            import random
        except ImportError:
            print('module import error')
        return learningRatio
    编号 色泽 根蒂 敲声 纹理 脐部 触感 密度 含糖率 好瓜
    0 1 青绿 蜷缩 浊响 清晰 凹陷 硬滑 0.697 0.460
    1 2 乌黑 蜷缩 沉闷 清晰 凹陷 硬滑 0.774 0.376
    2 3 乌黑 蜷缩 浊响 清晰 凹陷 硬滑 0.634 0.264
    3 4 青绿 蜷缩 沉闷 清晰 凹陷 硬滑 0.608 0.318
    4 5 浅白 蜷缩 浊响 清晰 凹陷 硬滑 0.556 0.215
    5 6 青绿 稍蜷 浊响 清晰 稍凹 软粘 0.403 0.237
    6 7 乌黑 稍蜷 浊响 稍糊 稍凹 软粘 0.481 0.149
    7 8 乌黑 稍蜷 浊响 清晰 稍凹 硬滑 0.437 0.211
    8 9 乌黑 稍蜷 沉闷 稍糊 稍凹 硬滑 0.666 0.091
    9 10 青绿 硬挺 清脆 清晰 平坦 软粘 0.243 0.267
    10 11 浅白 硬挺 清脆 模糊 平坦 硬滑 0.245 0.057
    11 12 浅白 蜷缩 浊响 模糊 平坦 软粘 0.343 0.099
    12 13 青绿 稍蜷 浊响 稍糊 凹陷 硬滑 0.639 0.161
    13 14 浅白 稍蜷 沉闷 稍糊 凹陷 硬滑 0.657 0.198
    14 15 乌黑 稍蜷 浊响 清晰 稍凹 软粘 0.360 0.370
    15 16 浅白 蜷缩 浊响 模糊 平坦 硬滑 0.593 0.042
    16 17 青绿 蜷缩 沉闷 稍糊 稍凹 硬滑 0.719 0.103
    17 18 青绿 蜷缩 浊响 清晰 凹陷 硬滑 0.697 0.460 NaN
    # outputlayer() 函数
    # 计算函数输出层的输出值Yk
    def outputlayer(df):
        @param df: the dataframe of pandas
        @return Yk:the output 
    # 复杂的参数让人头疼
    # define class()
    # define the neural networks structure,创建整个算法的框架
    the definition of BP network class
    class BP_network: 
        def __init__(self):
            initial variables
            # node number each layer
            self.i_n = 0           
            self.h_n = 0   
            self.o_n = 0
            # output value for each layer
            self.i_v = []       
            self.h_v = []
            self.o_v = []
            # parameters (w, t)
            self.ih_w = []    # weight for each link
            self.ho_w = []
            self.h_t  = []    # threshold for each neuron
            self.o_t  = []
            # definition of alternative activation functions and it's derivation
            self.fun = {
                'Sigmoid': Sigmoid,          # 对数几率函数
                'SigmoidDerivate': SigmoidDerivate,
                'Tanh': Tanh,              # 双曲正切函数
                'TanhDerivate': TanhDerivate,
    # CreateNN() 函数
    # 将架构进行填充
    def CreateNN(self, ni, nh, no, actfun):
            build a BP network structure and initial parameters
            @param ni, nh, no: the neuron number of each layer
            @param actfun: string, the name of activation function
            # import module packages
            import numpy as np 
            import random
            # assignment of node number
            # 对每层的结点树的输入值进行赋值
            self.i_n = ni
            self.h_n = nh
            self.o_n = no
            # initial value of output for each layer
            self.i_v = np.zeros(self.i_n)
            self.h_v = np.zeros(self.h_n)
            self.o_v = np.zeros(self.o_n)
            # initial weights for each link (random initialization)
            self.ih_w = np.zeros([self.i_n, self.h_n])
            self.ho_w = np.zeros([self.h_n, self.o_n])
            # 利用循环来对权值进行赋值
            for i in range(self.i_n):  
                for h in range(self.h_n): 
                    self.ih_w[i][h] = rand(0,1)#  float(0,1) # 调用rand()函数
            for h in range(self.h_n):  
                for j in range(self.o_n): 
                    self.ho_w[h][j] = rand(0,1)
            # initial threshold for each neuron
            self.h_t = np.zeros(self.h_n)
            self.o_t = np.zeros(self.o_n)
            for h in range(self.h_n): self.h_t[h] = rand(0,1)
            for j in range(self.o_n): self.o_t[j] = rand(0,1)
            # initial activation function
            # 这个不调库能直接用?不是很理解
            self.af  = self.fun[actfun]
            self.afd = self.fun[actfun+'Derivate']
    # 随机取值函数的定义
    the definition of random function
    def rand(a, b):
        random value generation for parameter initialization
        @param a,b: the upper and lower limitation of the random value
        from random import random
        return (b - a) * random() + a
    # define th need functions
    # 一些激活函数
    the definition of activation functions
    def Sigmoid(x):
        definition of sigmoid function and it's derivation
        from math import exp
        return 1.0 / (1.0 + exp(-x))
    def SigmoidDerivate(y):
        return y * (1 - y)
    def Tanh(x):
        definition of sigmoid function and it's derivation
        from math import tanh
        return tanh(x)
    def TanhDerivate(y):
        return 1 - y*y
    # predict process through the network
    # 计算一个输出
    def Pred(self, x):
            @param x: the input array for input layer
            # activate input layer
            for i in range(self.i_n):
                self.i_v[i] = x[i]
            # activate hidden layer
            for h in range(self.h_n):
                total = 0.0
                for i in range(self.i_n):
                    total += self.i_v[i] * self.ih_w[i][h]
                self.h_v[h] = self.af(total - self.h_t[h])
            # activate output layer
            for j in range(self.o_n):
                total = 0.0
                for h in range(self.h_n):
                    total += self.h_v[h] * self.ho_w[h][j]
                self.o_v[j] = self.af(total - self.o_t[j])
    西瓜数据集的离散性变量该如何处理 例如:色泽{青緑,乌黑,浅白}={0,1,2}  ??
    # the implementation of BP algorithms on one slide of sample
    # backPropagate() 函数
    # 后向传播函数,进行计算
    def BackPropagate(self, x, y, lr):
            @param x, y: array, input and output of the data sample
            @param lr: float, the learning rate of gradient decent iteration
            # import need module  packages
            import numpy as np 
            # get current network output
            # calculate the gradient based on output
            o_grid = np.zeros(self.o_n) 
            for j in range(self.o_n):
                # 输出层的神经元梯度项,参考西瓜书 5.3 公式(5.10)
                o_grid[j] = (y[j] - self.o_v[j]) * self.afd(self.o_v[j])
                # 这个self.afd()函数就相当于yk(1-yk)
            # caculate the gradient of hidden layer
            # 计算隐藏层的梯度项Eh
            h_grid = np.zeros(self.h_n)
            for h in range(self.h_n):
                for j in range(self.o_n):
                    h_grid[h] += self.ho_w[h][j] * o_grid[j]
                h_grid[h] = h_grid[h] * self.afd(self.h_v[h]) 
                # self.afd()函数就是 Bh(1-Bh)
            # updating the parameter
            # 将参数进行更新
            for h in range(self.h_n):  
                for j in range(self.o_n): 
                    # 更新公式
                    self.ho_w[h][j] += lr * o_grid[j] * self.h_v[h]
            for i in range(self.i_n):  
                for h in range(self.h_n): 
                    self.ih_w[i][h] += lr * h_grid[h] * self.i_v[i]     
            for j in range(self.o_n):
                self.o_t[j] -= lr * o_grid[j]    
            for h in range(self.h_n):
                self.h_t[h] -= lr * h_grid[h]
    # define TrainStandard() 函数
    # 标准的BP函数,计算累积误差
    def TrainStandard(self, data_in, data_out, lr=0.05):
            @param lr, learning rate, default 0.05
            @param data_in :the networks input data
            @param data_out:the output data of output layer
            @return: e, accumulated error
            @return: e_k, error array of each step
            e_k = []
            for k in range(len(data_in)):
                x = data_in[k]
                y = data_out[k]
                self.BackPropagate(x, y, lr)
                # error in train set for each step
                # 计算均方误差
                y_delta2 = 0.0
                for j in range(self.o_n):
                    y_delta2 += (self.o_v[j] - y[j]) * (self.o_v[j] - y[j])  
            # total error of training
            # 先计算出累积误差,然后最小化累积误差
            e = sum(e_k)/len(e_k)
            return e, e_k
    # 返回预测的标签,好瓜是1,坏瓜是0
    def PredLabel(self, X):
            predict process through the network
            @param X: the input sample set for input layer
            @return: y, array, output set (0,1 - class) based on [winner-takes-all] 
            import numpy as np
            y = []
            for m in range(len(X)):
                if self.o_v[0] > 0.5:  y.append(1)
                else : y.append(0)
    #             max_y = self.o_v[0]
    #             label = 0
    #             for j in range(1,self.o_n):
    #                 if max_y < self.o_v[j]: label = j
    #             y.append(label)
            return np.array(y)  
    4.2 利用tensorflow 来实现BP算法










    # Standardize features by removing the mean and scaling to unit variance



    一般情况下都是数据进行处理,满足输入的条件 向算法靠拢








    还是sklearn.metrics 模型的性能度量.

    这个例子不需要进行参数的更新? 主要还是损失函数的优化,本例中没有体现.

    print("Total mean squared error :".format(score))
    from sklearn import datasets,cross_validation,metrics
    from sklearn import preprocessing
    from tensorflow.contrib import learn
    import pandas as pd 
    import matplotlib.pyplot as plt 
    %matplotlib inline
    %config InlineBackend.figure_format='svg'
    from keras.models import Sequential
    from keras.layers import Dense

    read the original dataset with pandas packages

    mpg cylinders displacement horsepower weight acceleration model_year origin name
    0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
    1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
    2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
    3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
    4 17.0 8 302.0 140 3449 10.5 70 1 ford torino
    5 15.0 8 429.0 198 4341 10.0 70 1 ford galaxie 500
    6 14.0 8 454.0 220 4354 9.0 70 1 chevrolet impala
    7 14.0 8 440.0 215 4312 8.5 70 1 plymouth fury iii
    8 14.0 8 455.0 225 4425 10.0 70 1 pontiac catalina
    9 15.0 8 390.0 190 3850 8.5 70 1 amc ambassador dpl
    10 15.0 8 383.0 170 3563 10.0 70 1 dodge challenger se
    11 14.0 8 340.0 160 3609 8.0 70 1 plymouth 'cuda 340
    12 15.0 8 400.0 150 3761 9.5 70 1 chevrolet monte carlo
    13 14.0 8 455.0 225 3086 10.0 70 1 buick estate wagon (sw)
    14 24.0 4 113.0 95 2372 15.0 70 3 toyota corona mark ii
    15 22.0 6 198.0 95 2833 15.5 70 1 plymouth duster
    16 18.0 6 199.0 97 2774 15.5 70 1 amc hornet
    17 21.0 6 200.0 85 2587 16.0 70 1 ford maverick
    18 27.0 4 97.0 88 2130 14.5 70 3 datsun pl510
    19 26.0 4 97.0 46 1835 20.5 70 2 volkswagen 1131 deluxe sedan
    20 25.0 4 110.0 87 2672 17.5 70 2 peugeot 504
    21 24.0 4 107.0 90 2430 14.5 70 2 audi 100 ls
    22 25.0 4 104.0 95 2375 17.5 70 2 saab 99e
    23 26.0 4 121.0 113 2234 12.5 70 2 bmw 2002
    24 21.0 6 199.0 90 2648 15.0 70 1 amc gremlin
    25 10.0 8 360.0 215 4615 14.0 70 1 ford f250
    26 10.0 8 307.0 200 4376 15.0 70 1 chevy c20
    27 11.0 8 318.0 210 4382 13.5 70 1 dodge d200
    28 9.0 8 304.0 193 4732 18.5 70 1 hi 1200d
    29 27.0 4 97.0 88 2130 14.5 71 3 datsun pl510
    ... ... ... ... ... ... ... ... ... ...
    368 27.0 4 112.0 88 2640 18.6 82 1 chevrolet cavalier wagon
    369 34.0 4 112.0 88 2395 18.0 82 1 chevrolet cavalier 2-door
    370 31.0 4 112.0 85 2575 16.2 82 1 pontiac j2000 se hatchback
    371 29.0 4 135.0 84 2525 16.0 82 1 dodge aries se
    372 27.0 4 151.0 90 2735 18.0 82 1 pontiac phoenix
    373 24.0 4 140.0 92 2865 16.4 82 1 ford fairmont futura
    374 23.0 4 151.0 0 3035 20.5 82 1 amc concord dl
    375 36.0 4 105.0 74 1980 15.3 82 2 volkswagen rabbit l
    376 37.0 4 91.0 68 2025 18.2 82 3 mazda glc custom l
    377 31.0 4 91.0 68 1970 17.6 82 3 mazda glc custom
    378 38.0 4 105.0 63 2125 14.7 82 1 plymouth horizon miser
    379 36.0 4 98.0 70 2125 17.3 82 1 mercury lynx l
    380 36.0 4 120.0 88 2160 14.5 82 3 nissan stanza xe
    381 36.0 4 107.0 75 2205 14.5 82 3 honda accord
    382 34.0 4 108.0 70 2245 16.9 82 3 toyota corolla
    383 38.0 4 91.0 67 1965 15.0 82 3 honda civic
    384 32.0 4 91.0 67 1965 15.7 82 3 honda civic (auto)
    385 38.0 4 91.0 67 1995 16.2 82 3 datsun 310 gx
    386 25.0 6 181.0 110 2945 16.4 82 1 buick century limited
    387 38.0 6 262.0 85 3015 17.0 82 1 oldsmobile cutlass ciera (diesel)
    388 26.0 4 156.0 92 2585 14.5 82 1 chrysler lebaron medallion
    389 22.0 6 232.0 112 2835 14.7 82 1 ford granada l
    390 32.0 4 144.0 96 2665 13.9 82 3 toyota celica gt
    391 36.0 4 135.0 84 2370 13.0 82 1 dodge charger 2.2
    392 27.0 4 151.0 90 2950 17.3 82 1 chevrolet camaro
    393 27.0 4 140.0 86 2790 15.6 82 1 ford mustang gl
    394 44.0 4 97.0 52 2130 24.6 82 2 vw pickup
    395 32.0 4 135.0 84 2295 11.6 82 1 dodge rampage
    396 28.0 4 120.0 79 2625 18.6 82 1 ford ranger
    397 31.0 4 119.0 82 2720 19.4 82 1 chevy s-10

    398 rows × 9 columns

    # convert the displacement column as float
    # we got the data columns from the dataset
    # first and last (mpg and car names )are ignored for X
    for i in range (1,8):
        ax1=plt.subplot(number) # 4rows x 2 columns
        ax1.scatter(df[df.columns[i]],y)  # plot a scatter draw of the datapoints
    <matplotlib.figure.Figure at 0x7f37680ad9b0>
    # split the datasets
    # Scale the data for convergency optimization
    # set the transform parameters
    # bulid a 2 layer fully connected DNN with 10 and 5 units respectively
    # compile the model ,with the mean squared error as lost function
    # fit the model in 1000 epochs

    Train on 199 samples, validate on 99 samples
    Epoch 1/1000
    - 2s - loss: 617.0525 - val_loss: 609.8485
    Epoch 2/1000
    - 0s - loss: 616.6131 - val_loss: 609.3912
    Epoch 3/1000
    - 0s - loss: 616.1424 - val_loss: 608.8852
    Epoch 4/1000

     - 0s - loss: 6.8414 - val_loss: 8.4878
    Epoch 96
    不要用狭隘的眼光看待不了解的事物,自己没有涉及到的领域不要急于否定. 每天学习一点,努力过好平凡的生活.
