zoukankan      html  css  js  c++  java
  • 【作业四】林轩田机器学习技法 + 机器学习公开新课学习个人体会

    这次作业的coding任务量比较大,总的来说需要实现neural network, knn, kmeans三种模型。

    Q11~Q14为Neural Network的题目,我用单线程实现的,运行的时间比较长,因此把这几道题的正确答案记录如下:

    Q11: 6

    Q12: 0.001

    Q13: 0.01

    Q14: 0.02 ≤ Eout ≤ 0.04

    其中Q11和Q14的答案比较明显,Q12和Q13有两个答案比较接近(参考了讨论区的内容,最终也调出来了)

    neural network的代码实现思路如下:

    1)实现权重矩阵W初始化(def init_W(nnet_struct, w_range))

    2)实现计算每一轮神经元输出的函数,即bp算法中的forward过程(def forward_process(x, y, W))

    3)实现计算每一轮output error对于每个神经元输入score的导数,即bp算法中的backward过程(def backward_process(x, y, neuron_output, W))

    4)利用梯度下降方法,更新各层权重矩阵W的函数(def update_W_withGD(x, neuron_output, gradient, W, ita))

    其中最难的是步骤3),要想实现矩阵化编程,需要对神经网络的每层结构熟练,同时对于你使用的编程语言的矩阵化操作要非常熟悉;自己在这个方面比较欠缺,还得是熟能生巧。

    >>自己第一次写NNet的算法,从单隐层(隐层个数2)开始调试的:按照模块1)2)3)4)的顺序,各个模块调试;循序渐进的调试速度比较慢,但模块质量高一些,后面的联合调试就省事一些。

    >>如果是特别复杂的网络,如何对这种gradient的算法进行调试呢?因为gradient各个点的gradient几乎是不可能都算到的,在网上查了gradient checking方法:http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization

    >>NNet的调参真的很重要,就Q14来说,即使是hidden units的总个数一样,如果每层的个数不同,最后的结果也是有差别的(我第一次比较粗心,把NNet的结构按照 3 8 1这样了,发现结果没有 8 3 1这样好),后面多搜搜调参相关的资料积累一下。

    代码如下(没有把调试的代码删掉,可以记录调试的经过,同时也防止以后犯类似的错误),确实乱了一些,请看官包涵了:

    #encoding=utf8
    import sys
    import numpy as np
    import math
    from random import *
    
    ##
    # read data from local file
    # return with numpy array
    def read_input_data(path):
        x = []
        y = []
        for line in open(path).readlines():
            if line.strip()=='': continue
            items = line.strip().split(' ')
            tmp_x = []
            for i in range(0,len(items)-1): tmp_x.append(float(items[i]))
            x.append(tmp_x)
            y.append(float(items[-1]))
        return np.array(x),np.array(y)
    
    ## 
    # initialize weight matrix
    # input neural network structure & initilizing uniform value range (both low and high)
    # each layer's bias need to be added
    # return with inialized W
    def init_W(nnet_struct, w_range):
        W = []
        for i in range(1,len(nnet_struct)):
            tmp_w = np.random.uniform(w_range['low'], w_range['high'], (nnet_struct[i-1]+1,nnet_struct[i]) )
            W.append(tmp_w)
        return W
    
    ## 
    # randomly pick sample from raw data for Stochastic Gradient Descent
    # T indicates the iterative numbers
    # return with data for each SGD iteration
    def pick_SGD_data(x, y, T):
        sgd_x = np.zeros((T,x.shape[1]))
        sgd_y = np.zeros(T)
        for i in range(T):
            index = randint(0, x.shape[0]-1)
            sgd_x[i] = x[index]
            sgd_y[i] = y[index]
        return sgd_x, sgd_y
    
    ## 
    # forward process
    # calculate each neuron's output
    def forward_process(x, y, W):
        ret = []
        #print W[0].shape
        #print W[1].shape
        pre_x = np.hstack((1,x))
        for i in range(len(W)):
            pre_x = np.tanh(np.dot(pre_x, W[i]))
            ret.append(pre_x)
            pre_x = np.hstack((1,pre_x))
        return ret
    
    ##
    # backward process
    # calcultae the gradient of error and each neuron's input score
    def backward_process(x, y, neuron_output, W):
        ret = []
        L = len(neuron_output)
        # print neuron_output[0].shape, neuron_output[1].shape
        # Output layer
        score = np.dot( np.hstack((1, neuron_output[L-2])), W[L-1])
        # print score
        # print score.shape
        gradient = np.array( [-2 * (y-neuron_output[L-1][0]) * tanh_gradient(score)] )
        # print gradient
        # print gradient.shape
        ret.insert(0, gradient)
        # Hidden layer 
        for i in range(L-2,-1,-1):
            if i==0:
                score = np.dot(np.hstack((1, x)),W[i])
                # print score.shape
                # print gradient.shape
                # print W[1][1:].transpose().shape
                # print score
                gradient = np.dot(gradient, W[1][1:].transpose()) * tanh_gradient(score)
                # print gradient
                # print gradient.shapeq
                ret.insert(0, gradient)
            else:
                score = np.dot(np.hstack((1,neuron_output[i-1])),W[i])
                # print score.shape
                # print gradient.shape
                # print W[i+1][1:].transpose().shape
                # print "......"
                gradient = np.dot(gradient , W[i+1][1:].transpose()) * tanh_gradient(score)
                # print gradient.shape
                # print "======"
                ret.insert(0, gradient)
        return ret
    
    # give a numpy array
    # boardcast tanh gradient to each element
    def tanh_gradient(s):
        ret = np.zeros(s.shape)
        for i in range(s.shape[0]):
            ret[i] = 4.000001 / (math.exp(2*s[i])+math.exp(-2*s[i])+2)
        return ret
    
    
    ##
    # update W with Gradient Descent
    def update_W_withGD(x, neuron_output, gradient, W, ita):
        ret = []
        L = len(W)
        # print "L:"+str(L)
        # print neuron_output[0].shape, neuron_output[1].shape
        # print gradient[0].shape, gradient[1].shape
        # print W[0].shape, W[1].shape
        # print np.hstack((1,x)).transpose().shape
        # print gradient[0].shape
        ret.append( W[0] - ita * np.array([np.hstack((1,x))]).transpose() * gradient[0] )
        for i in range(1, L, 1):
            ret.append( W[i] - ita * np.array([np.hstack((1,neuron_output[i-1]))]).transpose() * gradient[i] )
        # print len(ret)
        return ret
    
    ## 
    # calculate Eout
    def calculate_E(W, path):
        x,y = read_input_data(path)
        error_count = 0
        for i in range(x.shape[0]):
            if predict(x[i],y[i],W):
                error_count += 1
        return 1.000001*error_count/x.shape[0]
    
    def predict(x, y, W):
        y_predict = x
        for i in range(0, len(W), 1):
            y_predict = np.tanh( np.dot( np.hstack((1,y_predict)), W[i] ) )
        y_predict = 1 if y_predict>0 else -1
        return y_predict!=y
    
    ##
    # Q11
    def Q11(x,y):
        R = 20 # repeat time
        Ms = { 6, 16 } # hidden units
        M_lowests = {}
        for M in Ms: M_lowests[M] = 0
        for r in range(R):
            T = 50000
            ita = 0.1
            min_M = -1
            E_min = float("inf")
            for M in Ms:
                sgd_x, sgd_y = pick_SGD_data(x, y, T)
                nnet_struct = [ x.shape[1], M, 1 ]
                # print nnet_struct
                w_range = {}
                w_range['low'] = -0.1
                w_range['high'] = 0.1
                W = init_W(nnet_struct, w_range)
                # for i in range(len(W)):
                #    print W[i]
                # print sgd_x,sgd_y
                for t in range(T):
                    neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                    # print sgd_x[t],sgd_y[t]
                    # print W
                    # print neuron_output
                    error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                    # print error_neuronInputScore_gradient
                    W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)
                E = calculate_E(W,"test.dat")
                # print str(r)+":::"+str(M)+":"+str(E)
                M_lowests[M] += E
        for k,v in M_lowests.items():
            print str(k)+":"+str(v)
    
    ##
    # Q12
    def Q12(x,y):
        ita = 0.1
        M = 3
        nnet_struct = [ x.shape[1], M, 1 ]
        Rs = { 0.001, 0.1 }
        R_lowests = {}
        for R in Rs: R_lowests[R] = 0
        N = 40
        T = 30000
        for i in range(N):
            for R in Rs:
                sgd_x, sgd_y = pick_SGD_data(x, y, T)
                w_range = {}
                w_range['low'] = -1*R
                w_range['high'] = R
                W = init_W(nnet_struct, w_range)
                for t in range(T):
                    neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                    error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                    W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)
                E = calculate_E(W, "test.dat")
                print str(R)+":"+str(E)
                R_lowests[R] += E
        for k,v in R_lowests.items():
            print str(k)+":"+str(v)
    
    ## 
    # Q13
    def Q13(x,y):
        M = 3
        nnet_struct = [ x.shape[1], M, 1 ]
        itas = {0.001,0.01,0.1}
        ita_lowests = {}
        for ita in itas: ita_lowests[ita] = 0
        N = 20
        T = 20000
        for i in range(N):
            for ita in itas:
                sgd_x, sgd_y = pick_SGD_data(x, y, T)
                w_range = {}
                w_range['low'] = -0.1
                w_range['high'] = 0.1
                W = init_W(nnet_struct, w_range)
                for t in range(T):
                    neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                    error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                    W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)
                E = calculate_E(W, "test.dat")
                print str(ita)+":"+str(E)
                ita_lowests[ita] += E
        for k,v in ita_lowests.items():
            print str(k)+":"+str(v)
    
    ##
    # Q14
    def Q14(x,y):
        T = 50000
        ita = 0.01
        E_total = 0
        R = 10
        for i in range(R):
            nnet_struct = [ x.shape[1], 8, 3, 1 ]
            w_range = {}
            w_range['low'] = -0.1
            w_range['high'] = 0.1
            W = init_W(nnet_struct, w_range)
            sgd_x, sgd_y = pick_SGD_data(x, y, T)
            for t in range(T):
                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)    
            E = calculate_E(W, "test.dat")
            print E
            E_total += E
        print E_total*1.0/R
    
    
    def main():
        x,y = read_input_data("train.dat")
        # print x.shape, y.shape
        # Q11(x, y)
        # Q12(x, y)
        # Q13(x, y)
        Q14(x, y)
    
    
    
    
    
    if __name__ == '__main__':
        main()

    Q15~Q18是KNN算法相关的,各道题几乎秒出结果,这里不记录答案了:

    KNN的核心,也就是KNN函数了:

    1)给定K个邻居数,返回这个点属于哪一类,代码尽量写的可配置一些

    2)numpy有个argsort函数,可以根据数组的value大小,对下标index进行排序;并返回排序后的index;利用好这个特性,代码很简洁

    3)如果是其他的语言,应该实现一个类似numpy.argsort的模块,代码整体上清晰不少能

    KNN的代码如下:

    #encoding=utf8
    import sys
    import numpy as np
    import math
    from random import *
    
    ##
    # read data from local file
    # return with numpy array
    def read_input_data(path):
        x = []
        y = []
        for line in open(path).readlines():
            if line.strip()=='': continue
            items = line.strip().split(' ')
            tmp_x = []
            for i in range(0,len(items)-1): tmp_x.append(float(items[i]))
            x.append(tmp_x)
            y.append(float(items[-1]))
        return np.array(x),np.array(y)
    
    
    ## 
    # KNN ( for binary classification )
    # input all labeled data & test sample
    # return with label
    def KNN(k, x, y, test_x):
        distance = np.sum((x-test_x)*(x-test_x), axis=1)
        order = np.argsort(distance)
        ret = 0
        for i in range(k):
            ret += y[order[i]]
        return 1 if ret>0 else -1
    
    
    ##
    # Q15 calculate Ein
    def calculate_Ein(x, y):
        error_count = 0
        k = 5
        for i in range(x.shape[0]-1):
            # tmp_x = np.vstack( ( x[0:i],x[(i+1):(x.shape[0]-1)] ) )
            # tmp_y = np.hstack( ( y[0:i],y[(i+1):(x.shape[0]-1)] ) )
            ret = KNN( k, x, y, x[i])
            if y[i]!=ret:
                error_count += 1
        return 1.0*error_count/x.shape[0]
    
    ##
    # Q16 calculate Eout
    def calculate_Eout(x, y, path):
        test_x, test_y = read_input_data(path)
        error_count = 0
        k = 1
        for i in range(test_x.shape[0]):
            ret = KNN (k, x, y, test_x[i])
            if test_y[i]!=ret:
                error_count += 1
        return 1.0*error_count/test_x.shape[0]
    
    def main():
        x,y = read_input_data("knn_train.dat")
        print calculate_Ein(x,y)
        print calculate_Eout(x,y, "knn_test.dat")
    
    if __name__ == '__main__':
        main()

    Q19~Q20是Kmeans算法相关的,运行代码也很快可以得出结果,不记录答案了:

    Kmeans的算法实现思路非常清晰:

    1)实现初始化随机选各类中心点的功能(题目中是随机选原始数据的点,如果是其他的选点方法,单独拎出来一个模块,不影响其他模块

    2)实现每次更新各个数据点类别的功能(def update_category(x, K, centers)

    3)固定各个点的类别,更新各个类别的center点坐标(def update_centers(x, y, K)

    模块实现上,得益于numpy的矩阵计算操作函数。(应该掌握一套自己的矩阵计算操作代码,这样可以随时拿起来二次开发

    代码如下:

    #encoding=utf8
    import sys
    import numpy as np
    import math
    from random import *
    
    ##
    # read data from local file
    # return with numpy array
    def read_input_data(path):
        x = []
        for line in open(path).readlines():
            if line.strip()=='': continue
            items = line.strip().split(' ')
            tmp_x = []
            for i in range(0,len(items)): tmp_x.append(float(items[i]))
            x.append(tmp_x)
        return np.array(x)
    
    
    ## 
    # input all data and category K
    # return K category centers
    def Kmeans(x, K):
        T = 50 
        E_total = 0
        for t in range(T):
            centers = init_centers(x, K)
            y = np.zeros(x.shape[0])
            R = 50
            for r in range(R):
                y = update_category(x, K, centers)
                centers = update_centers(x, y, K)
            E = calculate_Ein(x, y, centers)
            print E
            E_total += E
        return E_total*1.0/T
    
    def init_centers(x, K):
        ret = []
        order = range(x.shape[0])
        np.random.shuffle(order)
        for i in range(K):
            ret.append(x[order[i]])
        return np.array(ret)
    
    def update_category(x, K, centers):
        y = []
        for i in range(x.shape[0]):
            category = -1
            distance = float("inf")
            for k in range(K):
                d = np.sum((x[i] - centers[k])*(x[i] - centers[k]),axis=0)
                if d < distance:
                    distance = d
                    category = k
            y.append(category)
        return np.array(y)
    
    def update_centers(x, y, K):
        centers = []
        for k in range(K):
            # print "np.sum(x[np.where(y==k)],axis=0)"
            # print np.sum(x[np.where(y==k)],axis=0).shape
            center = np.sum(x[np.where(y==k)],axis=0)*1.0/np.array(np.where(y==k)).shape[1]
            centers.append(center)
        return np.array(centers)
    
    def calculate_Ein(x, y, centers):
        # print centers[0].shape
        error_total = 0
        for i in range(x.shape[0]):
            error_total += np.sum((x[i]-centers[y[i]])*(x[i]-centers[y[i]]),axis=0)
        return 1.0*error_total/x.shape[0]
    
    
    def main():
        x = read_input_data("kmeans_train.dat")
        # print x.shape
        print Kmeans(x,2)
    
    
    if __name__ == '__main__':
        main()

    ==========================================================================

    完成了这次作业后,终于跟完了《机器学习基石+机器学习技法》32次课,8次coding作业。

    个人上完这门课后,主要有三点收获:

    1)通过coding的作业题目,实现了一些主流机器学习算法(Perceptron、AdaBoost-stump、Linear Regression、Logistic Regression、Decision Tree、Neural Network、KNN、Kmeans);以前都是用算法包,对各个算法的理解不如实现过一遍来得深和细。

    2)以前对各个算法的理解就是会用(其实也不能说太会用),上完课程后,对每个模型的Motivation有了一定的掌握:模型为什么要这么设计?Regularizer为什么要这么设计?模型的利弊有哪些?以及模型的一些比较直观的数学原理推导。

    3)以前看待各个机器学习算法,都是孤立的看待每个算法(这个算法是解决啥的,那个算法是解决啥的),没有成体系地把各个算法拎起来。台大这门课在整个授课环节中,都贯穿了非常强的体系的观念,这里举两个例子:

      a. Linear Network与Factorization有啥联系(15讲)

      b. Decision Tree与AdaBoost有啥关系(8、9讲)

      c. Linear Regression与Neural Network有啥关系(12讲)

    在看这门课之前,是绝对不会把上面的每组中两个模型联系起来看待的;但这门课确实给了比较深的motivation,非常强的全局主线。

    最后,谈一点个人上公开课的体会:

    1)只听一遍:走马观花,学到的东西微乎其微

    2)听课,写作业:实践者的态度去学,学到的东西比只听课要多了去了

    3)听课,写作业,写听课blog:实践者+研究者的态度去学;“最好的学就是教”,在写blog的过程中,会强迫自己把当时很多不清晰的point都搞清楚,要不然真的写不出来

    4)循环进行3):温故知新的道理大家都懂,就看有没有时间吧

    Sign 就写到这了.....

  • 相关阅读:
    CSpinButtonCtrl控件的使用
    JS基础语法
    JS
    层级,hover
    Html&Css
    定位
    制作静态网页
    查看trunk端口
    配置vlan trunk
    js日期的写法,获取girdviw的行数、提示信息、验证数量信息
  • 原文地址:https://www.cnblogs.com/xbf9xbf/p/4737525.html
Copyright © 2011-2022 走看看