本文细述上文引出的RAECost和SoftmaxCost两个类。
SoftmaxCost
我们已经知道。SoftmaxCost类在给定features和label的情况下(超參数给定),衡量给定权重(
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
@Override public double valueAt( double []
x) { if (
!requiresEvaluation(x) ) return value; int numDataItems
= Features.columns; int []
requiredRows = ArraysHelper.makeArray( 0 ,
CatSize- 2 ); ClassifierTheta
Theta = new ClassifierTheta(x,FeatureLength,CatSize); DoubleMatrix
Prediction = getPredictions (Theta, Features); double MeanTerm
= 1.0 /
( double )
numDataItems; double Cost
= getLoss (Prediction, Labels).sum() * MeanTerm; double RegularisationTerm
= 0.5 *
Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W); DoubleMatrix
Diff = Prediction.sub(Labels).muli(MeanTerm); DoubleMatrix
Delta = Features.mmul(Diff.transpose()); DoubleMatrix
gradW = Delta.getColumns(requiredRows); DoubleMatrix
gradb = ((Diff.rowSums()).getRows(requiredRows)); //Regularizing.
Bias does not have one. gradW
= gradW.addi(Theta.W.mul(Lambda)); Gradient
= new ClassifierTheta(gradW,gradb); value
= Cost + RegularisationTerm; gradient
= Gradient.Theta; return value;
}<br><br> public DoubleMatrix
getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br> {<br> int numDataItems
= Features.columns;<br> DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br> Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros( 1 ,numDataItems));<br>
return Activation.valueAt(Input);
<br> } |
是个典型的2层神经网络,没有隐层,首先依据features预測labels,预測结果用softmax归一化,然后依据误差反向传播算出权重梯度。
此处添加200字。
这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每一个label的概率。
计算cost的函数为getLoss。如果目标label的预測输出为
依据前述的神经网络后向传播算法,我们得到(
因此我们便理解了以下代码的含义:
1
|
DoubleMatrix
Delta = Features.mmul(Diff.transpose()); |
RAECost
先看实现代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
@Override public double valueAt( double []
x) { if (!requiresEvaluation(x)) return value; Theta
Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength); FineTunableTheta
Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength); Theta2.setWe(
Theta2.We.add(WeOrig) ); final RAEClassificationCost
classificationCost = new RAEClassificationCost( catSize,
AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2); final RAEFeatureCost
featureCost = new RAEFeatureCost( AlphaCat,
Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1); Parallel.For(DataCell,
new Parallel.Operation<LabeledDatum<Integer,Integer>>()
{ public void perform( int index,
LabeledDatum<Integer,Integer> Data) { try { LabeledRAETree
Tree = featureCost.Compute(Data); classificationCost.Compute(Data,
Tree); }
catch (Exception
e) { System.err.println(e.getMessage()); } } }); double costRAE
= featureCost.getCost(); double []
gradRAE = featureCost.getGradient().clone(); double costSUP
= classificationCost.getCost(); gradient
= classificationCost.getGradient(); value
= costRAE + costSUP; for ( int i= 0 ;
i<gradRAE.length; i++) gradient[i]
+= gradRAE[i]; System.gc();
System.gc(); System.gc();
System.gc(); System.gc();
System.gc(); System.gc();
System.gc(); return value; } |
cost由两部分组成,featureCost和classificationCost。程序遍历每一个样本,用featureCost.Compute(Data)生成一个递归树,同一时候累加cost和gradient。然后用classificationCost.Compute(Data, Tree)依据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。
RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树。然后调用BackPropagate计算梯度并累加。详细的算法过程。下一章分解。