zoukankan      html  css  js  c++  java
  • tensorflow基础练习:线性模型

    • TensorFlow是一个面向数值计算的通用平台,可以方便地训练线性模型。下面采用TensorFlow完成Andrew Ng主讲的Deep Learning课程练习题,提供了整套源码。

    线性回归
    多元线性回归
    逻辑回归

    线性回归

    
    # -*- coding: utf-8 -*-
    """
    Created on Wed Sep  6 19:46:04 2017
    
    @author: Administrator
    """
    
    #!/usr/bin/env python
    # -*- coding=utf-8 -*-
    # @author: ranjiewen
    # @date: 2017-9-6
    # @description: compare scikit-learn and tensorflow, using linear regression data from deep learning course by Andrew Ng.
    # @ref: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex2/ex2.html
    
    import tensorflow as tf
    import numpy as np
    from sklearn import linear_model
    
    # Read x and y
    #x_data = np.loadtxt("ex2x.dat")
    #y_data = np.loadtxt("ex2y.dat")
    
    x_data = np.random.rand(100).astype(np.float32)
    y_data = x_data * 0.1 + 0.3+np.random.rand(100)
    
    # We use scikit-learn first to get a sense of the coefficients
    reg = linear_model.LinearRegression()
    reg.fit(x_data.reshape(-1, 1), y_data)
    
    print ("Coefficient of scikit-learn linear regression: k=%f, b=%f" % (reg.coef_, reg.intercept_))
    
    
    # Then we apply tensorflow to achieve the similar results
    # The structure of tensorflow code can be divided into two parts:
    
    # First part: set up computation graph
    W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
    b = tf.Variable(tf.zeros([1]))
    y = W * x_data + b
    
    loss = tf.reduce_mean(tf.square(y - y_data)) / 2
    # 对于tensorflow,梯度下降的步长alpha参数需要很仔细的设置,步子太大容易扯到蛋导致无法收敛;步子太小容易等得蛋疼。迭代次数也需要细致的尝试。
    optimizer = tf.train.GradientDescentOptimizer(0.07)  # Try 0.1 and you will see unconvergency
    train = optimizer.minimize(loss)
    
    init = tf.initialize_all_variables()
    
    # Second part: launch the graph
    sess = tf.Session()
    sess.run(init)
    
    for step in range(1500):
        sess.run(train)
        if step % 100 == 0:
            print (step, sess.run(W), sess.run(b))
    print ("Coeeficient of tensorflow linear regression: k=%f, b=%f" % (sess.run(W), sess.run(b)))
    
    • 思考:对于tensorflow,梯度下降的步长alpha参数需要很仔细的设置,步子太大容易扯到蛋导致无法收敛;步子太小容易等得蛋疼。迭代次数也需要细致的尝试。

    多元线性回归

    
    # -*- coding: utf-8 -*-
    """
    Created on Wed Sep  6 19:53:24 2017
    
    @author: Administrator
    """
    
    import numpy as np
    import tensorflow as tf
    from numpy import mat
    from sklearn import linear_model
    from sklearn import preprocessing
    
    # Read x and y
    #x_data = np.loadtxt("ex3x.dat").astype(np.float32)
    #y_data = np.loadtxt("ex3y.dat").astype(np.float32)
    
    x_data = [np.random.rand(100).astype(np.float32),np.random.rand(100).astype(np.float32)+10]
    x_data=mat(x_data).T
    y_data = 5.3+np.random.rand(100)
    
    # We evaluate the x and y by sklearn to get a sense of the coefficients.
    reg = linear_model.LinearRegression()
    reg.fit(x_data, y_data)
    print ("Coefficients of sklearn: K=%s, b=%f" % (reg.coef_, reg.intercept_))
    
    
    # Now we use tensorflow to get similar results.
    
    # Before we put the x_data into tensorflow, we need to standardize it
    # in order to achieve better performance in gradient descent;
    # If not standardized, the convergency speed could not be tolearated.
    # Reason:  If a feature has a variance that is orders of magnitude larger than others, 
    # it might dominate the objective function 
    # and make the estimator unable to learn from other features correctly as expected.
    # 对于梯度下降算法,变量是否标准化很重要。在这个例子中,变量一个是面积,一个是房间数,量级相差很大,如果不归一化,面积在目标函数和梯度中就会占据主导地位,导致收敛极慢。
    scaler = preprocessing.StandardScaler().fit(x_data)
    print (scaler.mean_, scaler.scale_)
    x_data_standard = scaler.transform(x_data)
    
    
    W = tf.Variable(tf.zeros([2, 1]))
    b = tf.Variable(tf.zeros([1, 1]))
    y = tf.matmul(x_data_standard, W) + b
    
    loss = tf.reduce_mean(tf.square(y - y_data.reshape(-1, 1)))/2
    optimizer = tf.train.GradientDescentOptimizer(0.3)
    train = optimizer.minimize(loss)
    
    init = tf.initialize_all_variables()
    
    
    sess = tf.Session()
    sess.run(init)
    for step in range(100):
        sess.run(train)
        if step % 10 == 0:
            print (step, sess.run(W).flatten(), sess.run(b).flatten())
    
    print ("Coefficients of tensorflow (input should be standardized): K=%s, b=%s" % (sess.run(W).flatten(), sess.run(b).flatten()))
    print ("Coefficients of tensorflow (raw input): K=%s, b=%s" % (sess.run(W).flatten() / scaler.scale_, sess.run(b).flatten() - np.dot(scaler.mean_ / scaler.scale_, sess.run(W))))
    
    • 思路:对于梯度下降算法,变量是否标准化很重要。在这个例子中,变量一个是面积,一个是房间数,量级相差很大,如果不归一化,面积在目标函数和梯度中就会占据主导地位,导致收敛极慢。

    逻辑回归

    # -*- coding: utf-8 -*-
    """
    Created on Wed Sep  6 20:13:15 2017
    数据下载:http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html
    
    @author: Administrator
    """
    
    import tensorflow as tf
    import numpy as np
    from numpy import mat
    from sklearn.linear_model import LogisticRegression
    from sklearn import preprocessing
    
    # Read x and y
    x_data = np.loadtxt("ex4Data/ex4x.dat").astype(np.float32)
    y_data = np.loadtxt("ex4Data/ex4y.dat").astype(np.float32)
    
    #x_data = [np.random.rand(100).astype(np.float32),np.random.rand(100).astype(np.float32)+10]
    #x_data=mat(x_data).T
    #y_data = 5.3+np.random.rand(100)
    
    
    scaler = preprocessing.StandardScaler().fit(x_data)
    x_data_standard = scaler.transform(x_data)
    
    # We evaluate the x and y by sklearn to get a sense of the coefficients.
    reg = LogisticRegression(C=999999999, solver="newton-cg")  # Set C as a large positive number to minimize the regularization effect
    reg.fit(x_data, y_data)
    print ("Coefficients of sklearn: K=%s, b=%f" % (reg.coef_, reg.intercept_))
    
    # Now we use tensorflow to get similar results.
    W = tf.Variable(tf.zeros([2, 1]))
    b = tf.Variable(tf.zeros([1, 1]))
    y = 1 / (1 + tf.exp(-tf.matmul(x_data_standard, W) + b))
    loss = tf.reduce_mean(- y_data.reshape(-1, 1) *  tf.log(y) - (1 - y_data.reshape(-1, 1)) * tf.log(1 - y))
    
    optimizer = tf.train.GradientDescentOptimizer(1.3)
    train = optimizer.minimize(loss)
    
    init = tf.initialize_all_variables()
    
    sess = tf.Session()
    sess.run(init)
    for step in range(100):
        sess.run(train)
        if step % 10 == 0:
            print (step, sess.run(W).flatten(), sess.run(b).flatten())
    
    print ("Coefficients of tensorflow (input should be standardized): K=%s, b=%s" % (sess.run(W).flatten(), sess.run(b).flatten()))
    print ("Coefficients of tensorflow (raw input): K=%s, b=%s" % (sess.run(W).flatten() / scaler.scale_, sess.run(b).flatten() - np.dot(scaler.mean_ / scaler.scale_, sess.run(W))))
    
    
    # Problem solved and we are happy. But...
    # I'd like to implement the logistic regression from a multi-class viewpoint instead of binary.
    # In machine learning domain, it is called softmax regression
    # In economic and statistics domain, it is called multinomial logit (MNL) model, proposed by Daniel McFadden, who shared the 2000  Nobel Memorial Prize in Economic Sciences.
    
    print ("------------------------------------------------")
    print ("We solve this binary classification problem again from the viewpoint of multinomial classification")
    print ("------------------------------------------------")
    
    # As a tradition, sklearn first
    reg = LogisticRegression(C=9999999999, solver="newton-cg", multi_class="multinomial")
    reg.fit(x_data, y_data)
    print ("Coefficients of sklearn: K=%s, b=%f" % (reg.coef_, reg.intercept_))
    print ("A little bit difference at first glance. What about multiply them with 2?")
    
    # Then try tensorflow
    W = tf.Variable(tf.zeros([2, 2]))  # first 2 is feature number, second 2 is class number
    b = tf.Variable(tf.zeros([1, 2]))
    V = tf.matmul(x_data_standard, W) + b
    y = tf.nn.softmax(V)  # tensorflow provide a utility function to calculate the probability of observer n choose alternative i, you can replace it with `y = tf.exp(V) / tf.reduce_sum(tf.exp(V), keep_dims=True, reduction_indices=[1])`
    
    # Encode the y label in one-hot manner
    lb = preprocessing.LabelBinarizer()
    lb.fit(y_data)
    y_data_trans = lb.transform(y_data)
    y_data_trans = np.concatenate((1 - y_data_trans, y_data_trans), axis=1)  # Only necessary for binary class 
    
    loss = tf.reduce_mean(-tf.reduce_sum(y_data_trans * tf.log(y), reduction_indices=[1]))
    optimizer = tf.train.GradientDescentOptimizer(1.3)
    train = optimizer.minimize(loss)
    
    init = tf.initialize_all_variables()
    
    sess = tf.Session()
    sess.run(init)
    for step in range(100):
        sess.run(train)
        if step % 10 == 0:
            print (step, sess.run(W).flatten(), sess.run(b).flatten())
    
    print ("Coefficients of tensorflow (input should be standardized): K=%s, b=%s" % (sess.run(W).flatten(), sess.run(b).flatten()))
    print ("Coefficients of tensorflow (raw input): K=%s, b=%s" % ((sess.run(W) / scaler.scale_).flatten(),  sess.run(b).flatten() - np.dot(scaler.mean_ / scaler.scale_, sess.run(W))))
    
    
    • 思考:
    • 对于逻辑回归,损失函数比线性回归模型复杂了一些。首先需要通过sigmoid函数,将线性回归的结果转化为0至1之间的概率值。然后写出每个样本的发生概率(似然),那么所有样本的发生概率就是每个样本发生概率的乘积。为了求导方便,我们对所有样本的发生概率取对数,保持其单调性的同时,可以将连乘变为求和(加法的求导公式比乘法的求导公式简单很多)。对数极大似然估计方法的目标函数是最大化所有样本的发生概率;机器学习习惯将目标函数称为损失,所以将损失定义为对数似然的相反数,以转化为极小值问题。
    • 我们提到逻辑回归时,一般指的是二分类问题;然而这套思想是可以很轻松就拓展为多分类问题的,在机器学习领域一般称为softmax回归模型。本文的作者是统计学与计量经济学背景,因此一般将其称为MNL模型。

    Reference:

  • 相关阅读:
    MongoDB权限验证
    【大型网站开发系列第一篇】——网站结构层次
    php5.4的htmlspecialchars处理中文
    将session值字符串重新生成session
    php支持短标签
    solr suggest+autocomplete实现自动提示
    linux命令
    【技术】【转】字节序问题大端法小端法
    eeePC(易PC)变态测试!(上)
    "挑iPod不问价"就是不懂“性价比”?
  • 原文地址:https://www.cnblogs.com/ranjiewen/p/7489005.html
Copyright © 2011-2022 走看看