zoukankan      html  css  js  c++  java
  • [深度学习]Python/Theano实现逻辑回归网络的代码分析

    2014-07-21 10:28:34

    首先PO上主要Python代码(2.7), 这个代码在Deep Learning上可以找到.

     1    # allocate symbolic variables for the data
     2     index = T.lscalar()  # index to a [mini]batch
     3     x = T.matrix('x')  # the data is presented as rasterized images
     4     y = T.ivector('y')  # the labels are presented as 1D vector of
     5                            # [int] labels
     6 
     7     # construct the logistic regression class
     8     # Each MNIST image has size 28*28
     9     classifier = LogisticRegression(input=x, n_in=24 * 48, n_out=10)
    10 
    11     # the cost we minimize during training is the negative log likelihood of
    12     # the model in symbolic format
    13     cost = classifier.negative_log_likelihood(y)
    14 
    15     # compiling a Theano function that computes the mistakes that are made by
    16     # the model on a minibatch
    17     test_model = theano.function(inputs=[index],
    18             outputs=classifier.errors(y),
    19             givens={
    20                 x: test_set_x[index * batch_size: (index + 1) * batch_size],
    21                 y: test_set_y[index * batch_size: (index + 1) * batch_size]})
    22 
    23     validate_model = theano.function(inputs=[index],
    24             outputs=classifier.errors(y),
    25             givens={
    26                 x: valid_set_x[index * batch_size:(index + 1) * batch_size],
    27                 y: valid_set_y[index * batch_size:(index + 1) * batch_size]})
    28 
    29     # compute the gradient of cost with respect to theta = (W,b)
    30     g_W = T.grad(cost=cost, wrt=classifier.W)
    31     g_b = T.grad(cost=cost, wrt=classifier.b)
    32 
    33     # specify how to update the parameters of the model as a list of
    34     # (variable, update expression) pairs.
    35     updates = [(classifier.W, classifier.W - learning_rate * g_W),
    36                (classifier.b, classifier.b - learning_rate * g_b)]
    37 
    38     # compiling a Theano function `train_model` that returns the cost, but in
    39     # the same time updates the parameter of the model based on the rules
    40     # defined in `updates`
    41     train_model = theano.function(inputs=[index],
    42             outputs=cost,
    43             updates=updates,
    44             givens={
    45                 x: train_set_x[index * batch_size:(index + 1) * batch_size],
    46                 y: train_set_y[index * batch_size:(index + 1) * batch_size]})

    代码长度不算太长, 只是逻辑关系需要厘清. 下面逐行分析这些代码. 

    代码中的T是theano.tensor的代名词.

    行1~行13:

    # allocate symbolic variables for the data
        index = T.lscalar()  # index to a [mini]batch
        x = T.matrix('x')  # the data is presented as rasterized images
        y = T.ivector('y')  # the labels are presented as 1D vector of
                               # [int] labels
    
        # construct the logistic regression class
        # Each MNIST image has size 28*28
        classifier = LogisticRegression(input=x, n_in=24 * 48, n_out=10)
    
        # the cost we minimize during training is the negative log likelihood of
        # the model in symbolic format
        cost = classifier.negative_log_likelihood(y)

    声明index, x, y三个符号变量(类似Matlab的symbol), 分别用来指代训练样本批序号, 输入图像矩阵, 期望输出向量.

    classifier是一个LR对象, 调用LR类的构造函数, 并将符号变量x作为输入, 我们就可以使用Theano.function方法在x和classifier中构造联系, 当x改变时, classifier也会改变.

    cost指代classifier中的负对数相似度, 使用符号变量y作为输入, 此处的作用和classifier相同, 不再赘述.

    行14~行28:

        # compiling a Theano function that computes the mistakes that are made by
        # the model on a minibatch
        test_model = theano.function(inputs=[index],
                outputs=classifier.errors(y),
                givens={
                    x: test_set_x[index * batch_size: (index + 1) * batch_size],
                    y: test_set_y[index * batch_size: (index + 1) * batch_size]})
    
        validate_model = theano.function(inputs=[index],
                outputs=classifier.errors(y),
                givens={
                    x: valid_set_x[index * batch_size:(index + 1) * batch_size],
                    y: valid_set_y[index * batch_size:(index + 1) * batch_size]})

    这里的2个model是容易让人迷惑的地方, 关于theano.function, 需要一些基础知识:

    比如声明2个符号变量a, b: a, b = T.iscalar(), T.iscalar() , 它们都是整形(i)标量(scalar), 再声明一个变量c:  c = a + b , 我们通过type(c)来查看其类型:

    >>> type(c)
    <class 'theano.tensor.var.TensorVariable'>
    >>> type(a)
    <class 'theano.tensor.var.TensorVariable'>

      c的类型和a, b相同, 都是Tensor变量. 至此准备工作完成, 我们通过theano.function来构建关系:  add = theano.function(inputs = [a, b], output = c) . 这条语句就构造了一个函数add, 它接收a, b为输入, 输出为c. 我们在Python中这样使用它即可:

    >>> add = theano.function(inputs = [a, b], outputs = c)
    >>> test = add(100, 100)
    >>> test
    array(200)

    好了, 有了基础知识, 就可以理解这2个model的含义:

    test_model = theano.function(inputs=[index],
                outputs=classifier.errors(y),
                givens={
                    x: test_set_x[index * batch_size: (index + 1) * batch_size],
                    y: test_set_y[index * batch_size: (index + 1) * batch_size]})

    输入是index, 输出则是classifier对象中的errors方法的返回值, 其中y作为errors方法的输入参数. 其中的classifier接收x作为输入参数.

    givens关键字的作用是使用冒号后面的变量来替代冒号前面的变量, 本例中, 即使用测试数据中的第index批数据(一批有batch_size个)来替换x和y.

    test_model用中文来解释就是: 接收第index批测试数据的图像数据x和期望输出y作为输入, 返回误差值的函数. 

    validate_model = theano.function(inputs=[index],
                outputs=classifier.errors(y),
                givens={
                    x: valid_set_x[index * batch_size:(index + 1) * batch_size],
                    y: valid_set_y[index * batch_size:(index + 1) * batch_size]})

    这里同上, 只不过使用的是验证数据.

    行29~行32:

        # compute the gradient of cost with respect to theta = (W,b)
        g_W = T.grad(cost=cost, wrt=classifier.W)
        g_b = T.grad(cost=cost, wrt=classifier.b)

    计算的是梯度, 用于学习算法, T.grad(y, x) 计算的是相对于x的y的梯度.

    行33~行37:

        # specify how to update the parameters of the model as a list of
        # (variable, update expression) pairs.
        updates = [(classifier.W, classifier.W - learning_rate * g_W),
                   (classifier.b, classifier.b - learning_rate * g_b)]

    updates是一个长度为2的list, 每个元素都是一组tuple, 在theano.function中, 每次调用对应函数, 使用tuple中的第二个元素来更新第一个元素.

    行38~行46:

      # compiling a Theano function `train_model` that returns the cost, but in
        # the same time updates the parameter of the model based on the rules
        # defined in `updates`
        train_model = theano.function(inputs=[index],
                outputs=cost,
                updates=updates,
                givens={
                    x: train_set_x[index * batch_size:(index + 1) * batch_size],
                    y: train_set_y[index * batch_size:(index + 1) * batch_size]})

    这里其余部分不再赘述. 需要注意的是增加了一个updates参数, 这个参数给定了每次调用train_model时对某些参数的修改(W, b). 另外输出也变成了cost函数(对数误差)而非test_model和valid-model中的errors函数(绝对误差).

  • 相关阅读:
    【LOJ】#2888. 「APIO2015」巴邻旁之桥 Palembang Bridges
    【AtCoder】ARC099题解
    【LOJ】#2265. 「CTSC2017」最长上升子序列
    【LOJ】#2264. 「CTSC2017」吉夫特
    【AtCoder】AGC028 (A-E)题解
    【AtCoder】ARC100 题解
    【AtCoder】ARC101题解
    【AtCoder】AGC026 题解
    【LOJ】 #2308. 「APIO2017」商旅
    【BZOJ】3456: 城市规划(多项式求ln)
  • 原文地址:https://www.cnblogs.com/lancelod/p/3857965.html
Copyright © 2011-2022 走看看