zoukankan      html  css  js  c++  java
  • jrae源码解析(二)

    本文细述上文引出的RAECost和SoftmaxCost两个类。

    SoftmaxCost

    我们已经知道,SoftmaxCost类在给定features和label的情况下(超参数给定),衡量给定权重($hidden imes catSize$)的误差值$cost$,并指出当前的权重梯度。看代码。

    @Override
    	public double valueAt(double[] x) 
    	{
    		if( !requiresEvaluation(x) )
    			return value;
    		int numDataItems = Features.columns;
    		
    		int[] requiredRows = ArraysHelper.makeArray(0, CatSize-2);
    		ClassifierTheta Theta = new ClassifierTheta(x,FeatureLength,CatSize);
    		DoubleMatrix Prediction = getPredictions (Theta, Features);
    		
    		double MeanTerm = 1.0 / (double) numDataItems;
    		double Cost = getLoss (Prediction, Labels).sum() * MeanTerm; 
    		double RegularisationTerm = 0.5 * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
    		
    		DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
    	    DoubleMatrix Delta = Features.mmul(Diff.transpose());
    	
    	    DoubleMatrix gradW = Delta.getColumns(requiredRows);
    	    DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
    	    
    	    //Regularizing. Bias does not have one.
    	    gradW = gradW.addi(Theta.W.mul(Lambda));
    	    
    	    Gradient = new ClassifierTheta(gradW,gradb);
    	    value = Cost + RegularisationTerm;
    	    gradient = Gradient.Theta;
    		return value; 
    	}

    public DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)
        {
            int numDataItems = Features.columns;
            DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);
            Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(1,numDataItems));
            return Activation.valueAt(Input);
        }

     是个典型的2层神经网络,没有隐层,首先根据features预测labels,预测结果用softmax归一化,然后根据误差反向传播算出权重梯度。

    此处增加200字。

    这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每个label的概率。

    计算cost的函数为getLoss,假设目标label的预测输出为$p^*$,则每个样本的cost也即误差函数为:

    $$cost=E(p^*)=-log(p^*)$$

    根据前述的神经网络后向传播算法,我们得到($j$为目标label时,否则为0):

    $$frac{partial E}{partial w_{ij}}=frac{partial E}{partial p_j}frac{partial h_j}{partial net_j}x_i=-frac{1}{p_j}p_j(1-p_j)x_i=-(1-p_j)x_i=-(label_j-p_j)feature_i$$

    因此我们便理解了下面代码的含义:

    DoubleMatrix Delta = Features.mmul(Diff.transpose());
    

    RAECost

    先看实现代码:

    @Override
    	public double valueAt(double[] x)
    	{
    		if(!requiresEvaluation(x))
    			return value;
    		
    		Theta Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength);
    		FineTunableTheta Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
    		Theta2.setWe( Theta2.We.add(WeOrig) );
    		
    		final RAEClassificationCost classificationCost = new RAEClassificationCost(
    				catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
    		final RAEFeatureCost featureCost = new RAEFeatureCost(
    				AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
    	
    		Parallel.For(DataCell, 
    			new Parallel.Operation<LabeledDatum<Integer,Integer>>() {
    				public void perform(int index, LabeledDatum<Integer,Integer> Data)
    				{
    					try {
    						LabeledRAETree Tree = featureCost.Compute(Data);
    						classificationCost.Compute(Data, Tree);					
    					} catch (Exception e) {
    						System.err.println(e.getMessage());
    					}
    				}
    		});
    		
    		double costRAE = featureCost.getCost();
    		double[] gradRAE = featureCost.getGradient().clone();
    			
    		double costSUP = classificationCost.getCost();
    		gradient = classificationCost.getGradient();
    			
    		value = costRAE + costSUP;
    		for(int i=0; i<gradRAE.length; i++)
    			gradient[i] += gradRAE[i];
    		
    		System.gc();	System.gc();
    		System.gc();	System.gc();
    		System.gc();	System.gc();
    		System.gc();	System.gc();
    		
    		return value;
    	}
    

    cost由两部分组成,featureCost和classificationCost。程序遍历每个样本,用featureCost.Compute(Data)生成一个递归树,同时累加cost和gradient,然后用classificationCost.Compute(Data, Tree)根据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

    RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树,然后调用BackPropagate计算梯度并累加。具体的算法过程,下一章分解。

  • 相关阅读:
    Sql Server 2008学习之第二天
    Sql Server 2008学习之第一天
    Codeforce 1175 D. Array Splitting
    CF1105C Ayoub and Lost Array ——动态规划
    数据结构——并查集
    动态规划——01背包问题
    常用技巧——离散化
    动态规划——稀疏表求解RMQ问题
    基础算法—快速幂详解
    欧拉函数及其扩展 小结
  • 原文地址:https://www.cnblogs.com/wuseguang/p/4110351.html
Copyright © 2011-2022 走看看