zoukankan      html  css  js  c++  java
  • jrae源代码解析(二)

    本文细述上文引出的RAECost和SoftmaxCost两个类。

    SoftmaxCost

    我们已经知道。SoftmaxCost类在给定features和label的情况下(超參数给定),衡量给定权重(hidden×catSize)的误差值cost,并指出当前的权重梯度。看代码。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    @Override
        public double valueAt(double[] x)
        {
            if( !requiresEvaluation(x) )
                return value;
            int numDataItems = Features.columns;
             
            int[] requiredRows = ArraysHelper.makeArray(0, CatSize-2);
            ClassifierTheta Theta = new ClassifierTheta(x,FeatureLength,CatSize);
            DoubleMatrix Prediction = getPredictions (Theta, Features);
             
            double MeanTerm = 1.0 / (double) numDataItems;
            double Cost = getLoss (Prediction, Labels).sum() * MeanTerm;
            double RegularisationTerm = 0.5 * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
             
            DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
            DoubleMatrix Delta = Features.mmul(Diff.transpose());
         
            DoubleMatrix gradW = Delta.getColumns(requiredRows);
            DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
             
            //Regularizing. Bias does not have one.
            gradW = gradW.addi(Theta.W.mul(Lambda));
             
            Gradient = new ClassifierTheta(gradW,gradb);
            value = Cost + RegularisationTerm;
            gradient = Gradient.Theta;
            return value;
        }<br><br>public DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>        int numDataItems = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(1,numDataItems));<br>        return Activation.valueAt(Input); <br>    }

     是个典型的2层神经网络,没有隐层,首先依据features预測labels,预測结果用softmax归一化,然后依据误差反向传播算出权重梯度。

    此处添加200字。

    这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每一个label的概率。

    计算cost的函数为getLoss。如果目标label的预測输出为p,则每一个样本的cost也即误差函数为:

    cost=E(p)=log(p)

    依据前述的神经网络后向传播算法,我们得到(j为目标label时,否则为0):

    Ewij=Epjhjnetjxi=1pjpj(1pj)xi=(1pj)xi=(labeljpj)featurei

    因此我们便理解了以下代码的含义:

    1
    DoubleMatrix Delta = Features.mmul(Diff.transpose());

     

    RAECost

    先看实现代码:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    @Override
        public double valueAt(double[] x)
        {
            if(!requiresEvaluation(x))
                return value;
             
            Theta Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength);
            FineTunableTheta Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
            Theta2.setWe( Theta2.We.add(WeOrig) );
             
            final RAEClassificationCost classificationCost = new RAEClassificationCost(
                    catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
            final RAEFeatureCost featureCost = new RAEFeatureCost(
                    AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
         
            Parallel.For(DataCell,
                new Parallel.Operation<LabeledDatum<Integer,Integer>>() {
                    public void perform(int index, LabeledDatum<Integer,Integer> Data)
                    {
                        try {
                            LabeledRAETree Tree = featureCost.Compute(Data);
                            classificationCost.Compute(Data, Tree);                
                        } catch (Exception e) {
                            System.err.println(e.getMessage());
                        }
                    }
            });
             
            double costRAE = featureCost.getCost();
            double[] gradRAE = featureCost.getGradient().clone();
                 
            double costSUP = classificationCost.getCost();
            gradient = classificationCost.getGradient();
                 
            value = costRAE + costSUP;
            for(int i=0; i<gradRAE.length; i++)
                gradient[i] += gradRAE[i];
             
            System.gc();    System.gc();
            System.gc();    System.gc();
            System.gc();    System.gc();
            System.gc();    System.gc();
             
            return value;
        }

    cost由两部分组成,featureCost和classificationCost。程序遍历每一个样本,用featureCost.Compute(Data)生成一个递归树,同一时候累加cost和gradient。然后用classificationCost.Compute(Data, Tree)依据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

    RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树。然后调用BackPropagate计算梯度并累加。详细的算法过程。下一章分解。

  • 相关阅读:
    十二、curator recipes之双重屏障DoubleBarrier
    十一、curator recipes之联锁InterProcessMultiLock
    十、curator recipes之信号量InterProcessSemaphoreV2
    九、curator recipes之不可重入锁InterProcessSemaphoreMutex
    八、curator recipes之选举主节点LeaderSelector
    五、curator recipes之选举主节点Leader Latch
    ADO.net 数据库连接new SqlConnection、Open、Close、Dispose
    Java学习笔记【八、数据结构】
    Java学习笔记【七、时间、日期、数字】
    Java学习笔记【六、正则表达式】
  • 原文地址:https://www.cnblogs.com/mfrbuaa/p/5344125.html
Copyright © 2011-2022 走看看