zoukankan      html  css  js  c++  java
  • deeplearning----学习一个简单的分类器

    零一损失


     我们的目的就是让错误次数(零一损失)尽可能的少:

    f(x)会得出在当前的theata条件下输入对应的最大概率的输出值。换言之,我们从x预测出f(x),如果这个值就是y,那么预测成功,反之失败。

    # zero_one_loss is a Theano variable representing a symbolic
    # expression of the zero one loss ; to get the actual value this
    # symbolic expression has to be compiled into a Theano function (see
    # the Theano tutorial for more details)
    zero_one_loss = T.sum(T.neq(T.argmax(p_y_given_x), y))
    #neq是I函数,T.neq(x,y)判断两个值是否不相等,not equal?

     负对数自然损失


     由于0-1损失是不可微的,在大型模型中去优化它相当耗费资源,因此我们最大化它的对数似然函数来完成(似然就是可能性):

    也就是最小化负对数似然损失

     

     负对数似然函数:negative log-likelihood (NLL)

    
    
    # NLL is a symbolic variable ; to get the actual value of NLL, this symbolic
    # expression has to be compiled into a Theano function (see the Theano
    # tutorial for more details)
    NLL = -T.sum(T.log(p_y_given_x)[T.arange(y.shape[0]), y])
    # note on syntax: T.arange(y.shape[0]) is a vector of integers [0,1,2,...,len(y)].
    # Indexing a matrix M by the two vectors [0,1,...,K], [a,b,...,k] returns the
    # elements M[0,a], M[1,b], ..., M[K,k] as a vector.  Here, we use this
    # syntax to retrieve the log-probability of the correct labels, y.

     随机梯度下降SGD(Stochastic Gradient Descent)


    # GRADIENT DESCENT
    
    while True:
        loss = f(params)
        d_loss_wrt_params = ... # compute gradient
        params -= learning_rate * d_loss_wrt_params
        if <stopping condition is met>:
            return params

    上面是一般梯度下降,基本思路是:损失--》梯度--》参数更新

    随机梯度下降是一次选几个样本进行训练。最简单的方式是一次一个:

    # STOCHASTIC GRADIENT DESCENT
    for (x_i,y_i) in training_set:
                                # imagine an infinite generator
                                # that may repeat examples (if there is only a finite training set)
        loss = f(params, x_i, y_i)
        d_loss_wrt_params = ... # compute gradient
        params -= learning_rate * d_loss_wrt_params
        if <stopping condition is met>:
            return params

     Minibatch SGD 除了一次使用多个样本,其他和sgd都一样

    or (x_batch,y_batch) in train_batches:
                                # imagine an infinite generator
                                # that may repeat examples
        loss = f(params, x_batch, y_batch)
        d_loss_wrt_params = ... # compute gradient using theano
        params -= learning_rate * d_loss_wrt_params
        if <stopping condition is met>:
            return params

    上面给出的都是伪代码,完整代码如下:

    # Minibatch Stochastic Gradient Descent
    
    # assume loss is a symbolic description of the loss function given
    # the symbolic variables params (shared variable), x_batch, y_batch;
    
    # compute gradient of loss with respect to params
    d_loss_wrt_params = T.grad(loss, params)
    
    # compile the MSGD step into a theano function
    updates = [(params, params - learning_rate * d_loss_wrt_params)]
    MSGD = theano.function([x_batch,y_batch], loss, updates=updates)
    
    for (x_batch, y_batch) in train_batches:
        # here x_batch and y_batch are elements of train_batches and
        # therefore numpy arrays; function MSGD also updates the params
        print('Current loss is ', MSGD(x_batch, y_batch))
        if stopping_condition_is_met:
            return params

    正则化


    我们希望模型能够用到其他数据上。为防止训练过度的问题(参数变的异常大),我们进行正则化,这里介绍L1/L2正则化,以及提前结束训练的方法

    对于我们的问题,可以具体定义为:

    其中

    观察可发现:当p=1的时候,就是绝对值和;p=2的时候就是根号平方和。

    # symbolic Theano variable that represents the L1 regularization term
    L1  = T.sum(abs(param))
    
    # symbolic Theano variable that represents the squared L2 term
    L2_sqr = T.sum(param ** 2)
    
    # the loss
    loss = NLL + lambda_1 * L1 + lambda_2 * L2

    提前结束训练


    # early-stopping parameters
    patience = 5000  # look as this many examples regardless
    patience_increase = 2     # wait this much longer when a new best is
                                  # found
    improvement_threshold = 0.995  # a relative improvement of this much is
                                   # considered significant
    validation_frequency = min(n_train_batches, patience/2)
                                  # go through this many
                                  # minibatches before checking the network
                                  # on the validation set; in this case we
                                  # check every epoch
    
    best_params = None
    best_validation_loss = numpy.inf
    test_score = 0.
    start_time = time.clock()
    
    done_looping = False
    epoch = 0
    while (epoch < n_epochs) and (not done_looping):
        # Report "1" for first epoch, "n_epochs" for last epoch
        epoch = epoch + 1
        for minibatch_index in xrange(n_train_batches):
    
            d_loss_wrt_params = ... # compute gradient
            params -= learning_rate * d_loss_wrt_params # gradient descent
    
            # iteration number. We want it to start at 0.
            iter = (epoch - 1) * n_train_batches + minibatch_index
            # note that if we do `iter % validation_frequency` it will be
            # true for iter = 0 which we do not want. We want it true for
            # iter = validation_frequency - 1.
            if (iter + 1) % validation_frequency == 0:
    
                this_validation_loss = ... # compute zero-one loss on validation set
    
                if this_validation_loss < best_validation_loss:
    
                    # improve patience if loss improvement is good enough
                    if this_validation_loss < best_validation_loss * improvement_threshold:
    
                        patience = max(patience, iter * patience_increase)
                    best_params = copy.deepcopy(params)
                    best_validation_loss = this_validation_loss
    
            if patience <= iter:
                done_looping = True
                break
    
    # POSTCONDITION:
    # best_params refers to the best out-of-sample parameters observed during the optimization
  • 相关阅读:
    不能以根用户身份运行 Google Chrome 浏览器
    Ubuntu 10.04 10.10 11.04 9.10 9.04 中文字体美化——安装雅黑字体
    笔记一:文本属性
    ubuntu下使用 android adb命令
    css常用技巧
    PHP max_execution_time 超时
    PHP判断文件夹是否存在和创建文件夹的方法
    javascript之HTML(select option)详解
    查询HTML标签select中options的值并定位其位置
    PHP网页进度条
  • 原文地址:https://www.cnblogs.com/Iknowyou/p/3581437.html
Copyright © 2011-2022 走看看