zoukankan      html  css  js  c++  java
  • 基于MNIST数据的softmax regression

      In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
        scheme if the 'multi_class' option is set to 'ovr', and uses the cross-
        entropy loss if the 'multi_class' option is set to 'multinomial'.
        (Currently the 'multinomial' option is supported only by the 'lbfgs',
        'sag' and 'newton-cg' solvers.)
        This class implements regularized logistic regression using the
        'liblinear' library, 'newton-cg', 'sag' and 'lbfgs' solvers. It can handle
        both dense and sparse input. Use C-ordered arrays or CSR matrices
        containing 64-bit floats for optimal performance; any other input format
        will be converted (and copied).
        The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
        with primal formulation. The 'liblinear' solver supports both L1 and L2
        regularization, with a dual formulation only for the L2 penalty.
        Read more in the :ref:`User Guide <logistic_regression>`.
        Parameters
        ----------
        penalty : str, 'l1' or 'l2', default: 'l2'
            Used to specify the norm used in the penalization. The 'newton-cg',
            'sag' and 'lbfgs' solvers support only l2 penalties.
            .. versionadded:: 0.19
               l1 penalty with SAGA solver (allowing 'multinomial' + L1)
        dual : bool, default: False
            Dual or primal formulation. Dual formulation is only implemented for
            l2 penalty with liblinear solver. Prefer dual=False when
            n_samples > n_features.
        tol : float, default: 1e-4
            Tolerance for stopping criteria.
        C : float, default: 1.0
            Inverse of regularization strength; must be a positive float.
            Like in support vector machines, smaller values specify stronger
            regularization.
        fit_intercept : bool, default: True
            Specifies if a constant (a.k.a. bias or intercept) should be
            added to the decision function.
      
        solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'},
            default: 'liblinear'
            Algorithm to use in the optimization problem.
            - For small datasets, 'liblinear' is a good choice, whereas 'sag' and
                'saga' are faster for large ones.
            - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'
                handle multinomial loss; 'liblinear' is limited to one-versus-rest
                schemes.
            - 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas
                'liblinear' and 'saga' handle L1 penalty.
            Note that 'sag' and 'saga' fast convergence is only guaranteed on
            features with approximately the same scale. You can
            preprocess the data with a scaler from sklearn.preprocessing.
            .. versionadded:: 0.17
               Stochastic Average Gradient descent solver.
            .. versionadded:: 0.19
               SAGA solver.
       
        multi_class : str, {'ovr', 'multinomial'}, default: 'ovr'
            Multiclass option can be either 'ovr' or 'multinomial'. If the option
            chosen is 'ovr', then a binary problem is fit for each label. Else
            the loss minimised is the multinomial loss fit across
            the entire probability distribution. Does not work for liblinear
            solver.
            .. versionadded:: 0.18
               Stochastic Average Gradient descent solver for 'multinomial' case.
       
        Attributes
      
        coef_ : array, shape (1, n_features) or (n_classes, n_features)
            Coefficient of the features in the decision function.
            `coef_` is of shape (1, n_features) when the given problem
            is binary.
        intercept_ : array, shape (1,) or (n_classes,)
            Intercept (a.k.a. bias) added to the decision function.
            If `fit_intercept` is set to False, the intercept is set to zero.
            `intercept_` is of shape(1,) when the problem is binary.
        n_iter_ : array, shape (n_classes,) or (1, )
            Actual number of iterations for all classes. If binary or multinomial,
            it returns only 1 element. For liblinear solver, only the maximum
            number of iteration across all classes is given.
    
    

    基于Softmax的mnist回归

    # -*- coding: utf-8 -*-
    """
    Created on Thu Sep  7 10:47:18 2017
    
    @author: Administrator
    """
    
    import gzip
    import struct
    import numpy as np
    from sklearn.linear_model import LogisticRegression
    from sklearn import preprocessing
    from sklearn.metrics import accuracy_score
    import tensorflow as tf
    
    # MNIST data is stored in binary format, 
    # and we transform them into numpy ndarray objects by the following two utility functions
    def read_image(file_name):
        with gzip.open(file_name, 'rb') as f:
            buf = f.read()
            index = 0
            magic, images, rows, columns = struct.unpack_from('>IIII' , buf , index)
            index += struct.calcsize('>IIII')
    
            image_size = '>' + str(images*rows*columns) + 'B'
            ims = struct.unpack_from(image_size, buf, index)
            
            im_array = np.array(ims).reshape(images, rows, columns)
            return im_array
    
    def read_label(file_name):
        with gzip.open(file_name, 'rb') as f:
            buf = f.read()
            index = 0
            magic, labels = struct.unpack_from('>II', buf, index)
            index += struct.calcsize('>II')
            
            label_size = '>' + str(labels) + 'B'
            labels = struct.unpack_from(label_size, buf, index)
    
            label_array = np.array(labels)
            return label_array
    
    print ("Start processing MNIST handwritten digits data...")
    train_x_data = read_image("MNIST_data/train-images-idx3-ubyte.gz")
    train_x_data = train_x_data.reshape(train_x_data.shape[0], -1).astype(np.float32)
    train_y_data = read_label("MNIST_data/train-labels-idx1-ubyte.gz")
    test_x_data = read_image("MNIST_data/t10k-images-idx3-ubyte.gz")
    test_x_data = test_x_data.reshape(test_x_data.shape[0], -1).astype(np.float32)
    test_y_data = read_label("MNIST_data/t10k-labels-idx1-ubyte.gz")
    
    train_x_minmax = train_x_data / 255.0
    test_x_minmax = test_x_data / 255.0
    
    # Of course you can also use the utility function to read in MNIST provided by tensorflow
    # from tensorflow.examples.tutorials.mnist import input_data
    # mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
    # train_x_minmax = mnist.train.images
    # train_y_data = mnist.train.labels
    # test_x_minmax = mnist.test.images
    # test_y_data = mnist.test.labels
    
    # We evaluate the softmax regression model by sklearn first
    eval_sklearn = False
    if eval_sklearn:
        print ("Start evaluating softmax regression model by sklearn...")
        reg = LogisticRegression(solver="lbfgs", multi_class="multinomial")
        reg.fit(train_x_minmax, train_y_data)
        np.savetxt('coef_softmax_sklearn.txt', reg.coef_, fmt='%.6f')  # Save coefficients to a text file
        test_y_predict = reg.predict(test_x_minmax)
        print ("Accuracy of test set: %f" % accuracy_score(test_y_data, test_y_predict))
    
    eval_tensorflow = True
    batch_gradient = False
    if eval_tensorflow:
        print ("Start evaluating softmax regression model by tensorflow...")
        # reformat y into one-hot encoding style
        lb = preprocessing.LabelBinarizer()
        lb.fit(train_y_data)
        train_y_data_trans = lb.transform(train_y_data)
        test_y_data_trans = lb.transform(test_y_data)
    
        x = tf.placeholder(tf.float32, [None, 784])
        W = tf.Variable(tf.zeros([784, 10]))
        b = tf.Variable(tf.zeros([10]))
        V = tf.matmul(x, W) + b
        y = tf.nn.softmax(V)
    
        y_ = tf.placeholder(tf.float32, [None, 10])
    
        loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
        optimizer = tf.train.GradientDescentOptimizer(0.5)
        train = optimizer.minimize(loss)
    
        init = tf.initialize_all_variables()
    
        sess = tf.Session()
        sess.run(init)
    
        if batch_gradient:
            for step in range(300):
                sess.run(train, feed_dict={x: train_x_minmax, y_: train_y_data_trans})
                if step % 10 == 0:
                    print ("Batch Gradient Descent processing step %d" % step)
            print ("Finally we got the estimated results, take such a long time...")
        else:
            for step in range(1000):
                sample_index = np.random.choice(train_x_minmax.shape[0], 100)
                batch_xs = train_x_minmax[sample_index, :]
                batch_ys = train_y_data_trans[sample_index, :]
                sess.run(train, feed_dict={x: batch_xs, y_: batch_ys})
                if step % 100 == 0:
                    print ("Stochastic Gradient Descent processing step %d" % step)
        np.savetxt('coef_softmax_tf.txt', np.transpose(sess.run(W)), fmt='%.6f')  # Save coefficients to a text file
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        print ("Accuracy of test set: %f" % sess.run(accuracy, feed_dict={x: test_x_minmax, y_: test_y_data_trans}))
    
    • 注意:
    • A Variable is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.
    • 不过从测试集的准确率来看,二者都在92%左右,sklearn稍微好一点。注意,92%的准确率看起来不错,但其实是一个很低的准确率,按照官网教程的说法,应该要感到羞愧。
    • sklearn的估计时间有点长,因为每一轮参数更新都是基于全量的训练集数据算出损失,再算出梯度,然后再改进结果的。
    • tensorflow采用batch gradient descent估计算法时,时间也比较长,原因同上。
    • tensorflow采用stochastic gradient descent估计算法时间短,最后的估计结果也挺好,相当于每轮迭代只用到了部分数据集算出损失和梯度,速度变快,但可能bias增加;所以把迭代次数增多,这样可以降低variance,总体上的误差相比batch gradient descent并没有差多少。

    官网demo

    • 自动下载数据
    
    # Copyright 2015 The TensorFlow Authors. All Rights Reserved.
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    # ==============================================================================
    
    """A very simple MNIST classifier.
    See extensive documentation at
    https://www.tensorflow.org/get_started/mnist/beginners
    """
    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function
    
    import argparse
    import sys
    
    from tensorflow.examples.tutorials.mnist import input_data
    
    import tensorflow as tf
    
    FLAGS = None
    
    
    def main(_):
      # Import data
      mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
    
      # Create the model
      x = tf.placeholder(tf.float32, [None, 784])
      W = tf.Variable(tf.zeros([784, 10]))
      b = tf.Variable(tf.zeros([10]))
      y = tf.matmul(x, W) + b
    
      # Define loss and optimizer
      y_ = tf.placeholder(tf.float32, [None, 10])
    
      # The raw formulation of cross-entropy,
      #
      #   tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),
      #                                 reduction_indices=[1]))
      #
      # can be numerically unstable.
      #
      # So here we use tf.nn.softmax_cross_entropy_with_logits on the raw
      # outputs of 'y', and then average across the batch.
      cross_entropy = tf.reduce_mean(
          tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
      train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    
      sess = tf.InteractiveSession()
      tf.global_variables_initializer().run()
      # Train
      # 该循环的每个步骤中,我们都会随机抓取训练数据中的100个批处理数据点,然后我们用这些数据点作为参数替换之前的占位符来运行train_step。
      # 使用一小部分的随机数据来进行训练被称为随机训练(stochastic
      # training)- 在这里更确切的说是随机梯度下降训练。在理想情况下,我们希望用我们所有的数据来进行每一步的训练,因为这能给我们更好的训练结果,但显然这需要很大的计算开销。
      # 所以,每一次训练我们可以使用不同的数据子集,这样做既可以减少计算开销,又可以最大化地学习到数据集的总体特性。
      for _ in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100) ##
        sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    
      # Test trained model
      correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
      accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
      print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                          y_: mnist.test.labels}))
    
    if __name__ == '__main__':
      parser = argparse.ArgumentParser()
      parser.add_argument('--data_dir', type=str, default='/tmp/tensorflow/mnist/input_data',
                          help='Directory for storing input data')
      FLAGS, unparsed = parser.parse_known_args()
      tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
    
    
    • result
    Start processing MNIST handwritten digits data...
    Start evaluating softmax regression model by tensorflow...
    WARNING:tensorflow:From D:Program FilesAnaconda3libsite-packages	ensorflowpythonutil	f_should_use.py:175: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
    Instructions for updating:
    Use `tf.global_variables_initializer` instead.
    2017-09-08 16:47:36.504803: W C:	f_jenkinshomeworkspace
    el-winMwindowsPY35	ensorflowcoreplatformcpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
    2017-09-08 16:47:36.504803: W C:	f_jenkinshomeworkspace
    el-winMwindowsPY35	ensorflowcoreplatformcpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
    Stochastic Gradient Descent processing step 0
    Stochastic Gradient Descent processing step 100
    Stochastic Gradient Descent processing step 200
    Stochastic Gradient Descent processing step 300
    Stochastic Gradient Descent processing step 400
    Stochastic Gradient Descent processing step 500
    Stochastic Gradient Descent processing step 600
    Stochastic Gradient Descent processing step 700
    Stochastic Gradient Descent processing step 800
    Stochastic Gradient Descent processing step 900
    Accuracy of test set: 0.915600
    
    
  • 相关阅读:
    洛谷 P1879 [USACO06NOV]玉米田Corn Fields
    洛谷 P2709 小B的询问
    洛谷 P1972 [SDOI2009]HH的项链
    洛谷 P3648 [APIO2014]序列分割
    洛谷 P2157 [SDOI2009]学校食堂
    洛谷 P1198 [JSOI2008]最大数
    洛谷 P3870 [TJOI2009]开关
    【模板】线段树2
    【模板】线段树1
    git之远程标签下载(远程分支)
  • 原文地址:https://www.cnblogs.com/ranjiewen/p/7494922.html
Copyright © 2011-2022 走看看