zoukankan      html  css  js  c++  java
  • 使用Logistic回归对MNIST手写字符进行分类识别

    简介

    在这节我们使用Theano用于最基本的分类器:Logistic回归(Logistic Regression)。
    全部的代码可以在我的CSDN下载中免费下载:http://download.csdn.net/detail/ws_20100/9222263
    下面我们从模型开始。


    模型

    逻辑回归是一个概率,线性分类器。它的参数包含一个权值矩阵W和一个偏置向量b。分类器将输入向量映射到一系列超平面上,每个超平面对应一个类别。输入向量与超平面的距离反映了输入属于对应类别的概率
    在数学上,一个输入向量x属于类别i(概率变量Y的值)的概率,记为:

    P(Y=i|x,W,b)=softmaxi(Wx+b)=eWix+bijeWix+bi
    模型的预测值ypred为概率最大的类别,定义为
    ypred=argmaxiP(Y=i|x,W,b)
    Theano中关于模型建立的代码如下:

    # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
    self.W = theano.shared(
        value=numpy.zeros(
            (n_in, n_out),
            dtype=theano.config.floatX
        ),
        name='W',
        borrow=True
    )
    # initialize the biases b as a vector of n_out 0s
    self.b = theano.shared(
        value=numpy.zeros(
            (n_out,),
            dtype=theano.config.floatX
        ),
        name='b',
        borrow=True
    )
    
    # symbolic expression for computing the matrix of class-membership
    # probabilities
    # Where:
    # W is a matrix where column-k represent the separation hyperplane for
    # class-k
    # x is a matrix where row-j  represents input training sample-j
    # b is a vector where element-k represent the free parameter of
    # hyperplane-k
    self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
    
    # symbolic description of how to compute prediction as class whose
    # probability is maximal
    self.y_pred = T.argmax(self.p_y_given_x, axis=1)

    由于模型的参数在训练时始终要保持回归状态,因此我们将Wb共享权值。它们在符号和内容上都存在共享。然后我们使用点积(dot)和softmax回归运算计算向量值P(Y|x,W,b)。结果p_y_given_x是一个向量类型的符号变量。为了得到实际模型的预测值,我们使用T.argmax运算符,这将返回一个索引值,代表在p_y_given_x的哪个位置的值最大(例如,具有最大概率的类别)。
    现在,我们所定义的模型其实并不能做任何事情,因为所有的参数都处于初始状态,下面将会介绍,如何学习最优化参数。


    定义损失函数

    训练最优化模型参数,涉及到最小化损失函数。在多类别的分类问题中,很自然的想法是使用负对数似然作为损失函数。这等价于在参数为θ的模型下,最大化数据集D的似然。我们从定义似然L和损失始:

    L(θ={W,b},D)=i=0|D|log(P(Y=y(i)|x(i),W,b))
    (θ={W,b},D)=L(θ={W,b},D)
    在最优化理论中,最小化任意非线性函数的最简单方法是梯度下降法(gradient descent)。
    这里使用的是小批量的概率梯度方法(mini-batches Stochastic Gradient Descent, MSGD)。Theano代码中关于损失的定义如下:

    # y.shape[0] is (symbolically) the number of rows in y, i.e.,
    # number of examples (call it n) in the minibatch
    # T.arange(y.shape[0]) is a symbolic vector which will contain
    # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
    # Log-Probabilities (call it LP) with one row per example and
    # one column per class LP[T.arange(y.shape[0]),y] is a vector
    # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
    # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
    # the mean (across minibatch examples) of the elements in v,
    # i.e., the mean log-likelihood across the minibatch.
    return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

    尽管在格式上,损失函数定义为数据集上每个误差项和的形式。但是在代码中,使用的是平均函数(T.mean),因为这样可以使得学习率不依赖于数据批的大小。


    创建一个回归类

    我们现在要定义一个LogisticRegression类,来囊括逻辑回归的所有特征。下面的代码包含了许多方面,并且注释的很清楚哦。

    class LogisticRegression(object):
        """Multi-class Logistic Regression Class
    
        The logistic regression is fully described by a weight matrix :math:`W`
        and bias vector :math:`b`. Classification is done by projecting data
        points onto a set of hyperplanes, the distance to which is used to
        determine a class membership probability.
        """
    
        def __init__(self, input, n_in, n_out):
            """ Initialize the parameters of the logistic regression
    
            :type input: theano.tensor.TensorType
            :param input: symbolic variable that describes the input of the
                          architecture (one minibatch)
    
            :type n_in: int
            :param n_in: number of input units, the dimension of the space in
                         which the datapoints lie
    
            :type n_out: int
            :param n_out: number of output units, the dimension of the space in
                          which the labels lie
    
            """
            # start-snippet-1
            # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
            self.W = theano.shared(
                value=numpy.zeros(
                    (n_in, n_out),
                    dtype=theano.config.floatX
                ),
                name='W',
                borrow=True
            )
            # initialize the biases b as a vector of n_out 0s
            self.b = theano.shared(
                value=numpy.zeros(
                    (n_out,),
                    dtype=theano.config.floatX
                ),
                name='b',
                borrow=True
            )
    
            # symbolic expression for computing the matrix of class-membership
            # probabilities
            # Where:
            # W is a matrix where column-k represent the separation hyperplane for
            # class-k
            # x is a matrix where row-j  represents input training sample-j
            # b is a vector where element-k represent the free parameter of
            # hyperplane-k
            self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
    
            # symbolic description of how to compute prediction as class whose
            # probability is maximal
            self.y_pred = T.argmax(self.p_y_given_x, axis=1)
            # end-snippet-1
    
            # parameters of the model
            self.params = [self.W, self.b]
    
            # keep track of model input
            self.input = input
    
        def negative_log_likelihood(self, y):
            """Return the mean of the negative log-likelihood of the prediction
            of this model under a given target distribution.
    
            .. math::
    
                frac{1}{|mathcal{D}|} mathcal{L} (	heta={W,b}, mathcal{D}) =
                frac{1}{|mathcal{D}|} sum_{i=0}^{|mathcal{D}|}
                    log(P(Y=y^{(i)}|x^{(i)}, W,b)) \
                ell (	heta={W,b}, mathcal{D})
    
            :type y: theano.tensor.TensorType
            :param y: corresponds to a vector that gives for each example the
                      correct label
    
            Note: we use the mean instead of the sum so that
                  the learning rate is less dependent on the batch size
            """
            # start-snippet-2
            # y.shape[0] is (symbolically) the number of rows in y, i.e.,
            # number of examples (call it n) in the minibatch
            # T.arange(y.shape[0]) is a symbolic vector which will contain
            # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
            # Log-Probabilities (call it LP) with one row per example and
            # one column per class LP[T.arange(y.shape[0]),y] is a vector
            # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
            # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
            # the mean (across minibatch examples) of the elements in v,
            # i.e., the mean log-likelihood across the minibatch.
            return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
            # end-snippet-2
    
        def errors(self, y):
            """Return a float representing the number of errors in the minibatch
            over the total number of examples of the minibatch ; zero one
            loss over the size of the minibatch
    
            :type y: theano.tensor.TensorType
            :param y: corresponds to a vector that gives for each example the
                      correct label
            """
    
            # check if y has same dimension of y_pred
            if y.ndim != self.y_pred.ndim:
                raise TypeError(
                    'y should have the same shape as self.y_pred',
                    ('y', y.type, 'y_pred', self.y_pred.type)
                )
            # check if y is of the correct datatype
            if y.dtype.startswith('int'):
                # the T.neq operator returns a vector of 0s and 1s, where 1
                # represents a mistake in prediction
                return T.mean(T.neq(self.y_pred, y))
            else:
                raise NotImplementedError()

    那么如何来实例化一个LogisticRegression类呢,可以看如下代码:

    # generate symbolic variables for input (x and y represent a
    # minibatch)
    x = T.matrix('x')  # data, presented as rasterized images
    y = T.ivector('y')  # labels, presented as 1D vector of [int] labels
    
    # construct the logistic regression class
    # Each MNIST image has size 28*28
    classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

    在上面的代码中,首先定义了输入变量x和对应类别y的符号变量。需要注意的是,xy是定义在LogisticRegression对象以外的。因为这个类需要输入值作为它__init__函数的参数,当你希望连接这些实例,来组成深度网络时,这个设定非常有用。一个层的输出可以当作下一层的输入。(这里并没有构建多层网络,但是代码可以在多层网络中重用)
    最后,我们定义一个(符号化)cost变量,用来最小化,使用实例方法classifier.negative_log_likelihood

    # the cost we minimize during training is the negative log likelihood of
    # the model in symbolic format
    cost = classifier.negative_log_likelihood(y)

    注意定义cost中有一个隐含的符号输入x,因为classifier的符号变量在初始化时就定义在x中。


    学习模型

    在多数编程语言(C/C++,Matlab,Python)中实现MSGD,可以使用损失函数对于参数的梯度表达式:/W/b。在复杂的模型中,具有严格的表达式形式/θ,特别是在需要考虑数值稳定性的情况下。
    使用Theano,这种工作被大量简化。它可以完成自动求导,并应用相应的数学变换,以提高数值稳定性。
    在Theano中获得/W/b,仅仅只需下面代码:

    g_W = T.grad(cost=cost, wrt=classifier.W)
    g_b = T.grad(cost=cost, wrt=classifier.b)

    g_Wg_b是符号变量,可以用于计算。函数train_model可以完成梯度下降,定义如下:

    # specify how to update the parameters of the model as a list of
    # (variable, update expression) pairs.
    updates = [(classifier.W, classifier.W - learning_rate * g_W),
               (classifier.b, classifier.b - learning_rate * g_b)]
    
    # compiling a Theano function `train_model` that returns the cost, but in
    # the same time updates the parameter of the model based on the rules
    # defined in `updates`
    train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    updates是一系列二元组。在每个二元组中,第一个元素是待更新的符号变量,第二个元素是用于计算新数值的符号函数。相似地,givens是一个字典,其中关键字是符号变量,其中的值指定了它们的置换。函数train_model定义如下:

    • 输入是小批量数据的索引index,连带数据批的大小(这不是输入,因为它是一个固定值),定义了x和相应的标签y
    • 返回值是index对应的xy所计算出的代价/损失值。
    • 对于每个函数调用,首先根据index置换训练集中xy的值。然后,将会评估该数据批所对应的代价值,并应用updates更新。

    每次调用train_model(index),它将会计算数据批,返回其代价值,并完成了MSGD的一步。整个的学习算法会不断地循环遍历所有的数据集,并且一次只会考虑一批数据内的所有样本,然后重复地调用train_model函数。


    测试模型

    当测试模型的时候,我们注重于错误分类的样本个数。所以,LogisticRegression类有一个额外的实例方法,用于尝试减少每个数据批中错误分类的样本个数。代码如下:

    def errors(self, y):
        """Return a float representing the number of errors in the minibatch
        over the total number of examples of the minibatch ; zero one
        loss over the size of the minibatch
    
        :type y: theano.tensor.TensorType
        :param y: corresponds to a vector that gives for each example the
                  correct label
        """
    
        # check if y has same dimension of y_pred
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type)
            )
        # check if y is of the correct datatype
        if y.dtype.startswith('int'):
            # the T.neq operator returns a vector of 0s and 1s, where 1
            # represents a mistake in prediction
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()

    然后,我们创建了一个函数test_model和一个函数validate_model,我们可以调用这些函数来挽回错误分类的值。你将会看到,validate_model是迭代退出的关键。这些函数的输入参数是数据批的索引号,然后函数将计算该索引号所对应的数据批中错误分类的个数。两个函数的唯一区别在于test_model面向的是测试集,validate_model面向的是验证集。

    # compiling a Theano function that computes the mistakes that are made by
    # the model on a minibatch
    test_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )
    
    validate_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: valid_set_x[index * batch_size: (index + 1) * batch_size],
            y: valid_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    综合代码

    最后的代码如下:【下载地址:http://download.csdn.net/detail/ws_20100/9222263
    使用者可以通过输入以下命令,使用SGD逻辑回归对MNIST字符进行分类。

    python code/logistic_sgd.py

    输出应该是这样的形式:

    ...
    epoch 72, minibatch 83/83, validation error 7.510417 %
         epoch 72, minibatch 83/83, test error of best model 7.510417 %
    epoch 73, minibatch 83/83, validation error 7.500000 %
         epoch 73, minibatch 83/83, test error of best model 7.489583 %
    Optimization complete with best validation score of 7.500000 %,with test performance 7.489583 %
    The code run for 74 epochs, with 1.936983 epochs/sec

    在一个Intel酷睿双核CPU E8400 @3.00 Ghz的主机上,大约1.936 epochs/sec,在经历75 epochs后,测试误差为7.489%。在GPU上,大约10.0 epochs/sec。


    使用已训练的模型进行预测

    当训练达到最低误差的时候,我们可以重新载入模型对新数据的标签进行预测,predict函数完成了这些操作:

    def predict():
        """
        An example of how to load a trained model and use it
        to predict labels.
        """
    
        # load the saved model
        classifier = cPickle.load(open('best_model.pkl'))
    
        # compile a predictor function
        predict_model = theano.function(
            inputs=[classifier.input],
            outputs=classifier.y_pred)
    
        # We can test it on some examples from test test
        dataset='mnist.pkl.gz'
        datasets = load_data(dataset)
        test_set_x, test_set_y = datasets[2]
        test_set_x = test_set_x.get_value()
    
        predicted_values = predict_model(test_set_x[:10])
        print ("Predicted values for the first 10 examples in test set:")
        print predicted_values

    参考资料

    Theano深度学习资料:http://deeplearning.net/tutorial/logreg.html


    【脚注】
    [1]对于更小的数据集或更简单的模型,复杂的下降算法可能更加有效。logistic_cg.py代码阐述了如何使用SciPy的共扼梯度方法(conjugate gradient solver)完成逻辑回归任务。logistic_cg.py代码可以在我的CSDN下载中免费下载:http://download.csdn.net/detail/ws_20100/9223959

  • 相关阅读:
    服务器时钟同步
    vue父组件向子组件传递数值 props
    sql 语句in 使用占位符
    vagrant 打包box 快速部署统一开发环境
    Memcache安装使用 linux系统
    centos 7 搭建lnmp环境搭建 yum 源安装
    vagrant搭建lnmp 环境(环境contos7+php72w+mariaDB10.2)
    linux定时任务 Cron Crontab命令
    vue使用el-upload 跨域上传文件跳坑小记
    vue key得理解
  • 原文地址:https://www.cnblogs.com/lixuebin/p/10814859.html
Copyright © 2011-2022 走看看