zoukankan      html  css  js  c++  java
  • (六)6.10 Neurons Networks implements of softmax regression

    softmax可以看做只有输入和输出的Neurons Networks,如下图:

    其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵。

    对损失函数J(θ)的形式有:

    
egin{align}
J(	heta) = - frac{1}{m} left[ sum_{i=1}^{m} sum_{j=1}^{k} 1left{y^{(i)} = j
ight} log frac{e^{	heta_j^T x^{(i)}}}{sum_{l=1}^k e^{ 	heta_l^T x^{(i)} }}  
ight]
              + frac{lambda}{2} sum_{i=1}^k sum_{j=0}^n 	heta_{ij}^2
end{align}

    算法步骤:

    首先,加载数据集{x(1),x(2),x(3)...x(m)}该数据集为一个n*m的矩阵,然后初始化参数 θ ,为一个k*n的矩阵(不考虑截距项):

         
	heta = egin{bmatrix}
mbox{---} 	heta_1^T mbox{---} \
mbox{---} 	heta_2^T mbox{---} \
vdots \
mbox{---} 	heta_k^T mbox{---} \
end{bmatrix}

     首先计算,该矩阵为k*m的:

    然后计算

    
egin{align} 
h(x^{(i)}) = 
frac{1}{ sum_{j=1}^{k}{e^{ 	heta_j^T x^{(i)} }} }
egin{bmatrix} 
e^{ 	heta_1^T x^{(i)} } \
e^{ 	heta_2^T x^{(i)} } \
vdots \
e^{ 	heta_k^T x^{(i)} } \
end{bmatrix}
end{align}

    该函数参数可以随意+-任意参数而保持值不变,所以为了防止 参数 过大,先减去一个常量,防止数据运算时产生溢出.

    
egin{align} 
h(x^{(i)}) &=
 
frac{1}{ sum_{j=1}^{k}{e^{ 	heta_j^T x^{(i)} }} }
egin{bmatrix} 
e^{ 	heta_1^T x^{(i)} } \
e^{ 	heta_2^T x^{(i)} } \
vdots \
e^{ 	heta_k^T x^{(i)} } \
end{bmatrix} \

&=

frac{ e^{-alpha} }{ e^{-alpha} sum_{j=1}^{k}{e^{ 	heta_j^T x^{(i)} }} }
egin{bmatrix} 
e^{ 	heta_1^T x^{(i)} } \
e^{ 	heta_2^T x^{(i)} } \
vdots \
e^{ 	heta_k^T x^{(i)} } \
end{bmatrix} \

&=

frac{ 1 }{ sum_{j=1}^{k}{e^{ 	heta_j^T x^{(i)} - alpha }} }
egin{bmatrix} 
e^{ 	heta_1^T x^{(i)} - alpha } \
e^{ 	heta_2^T x^{(i)} - alpha } \
vdots \
e^{ 	heta_k^T x^{(i)} - alpha } \
end{bmatrix} \


end{align}

    这里减去每一类的最大值,代表第i列最大值。

    对于上述矩阵,每列除以该列的总和,并且取log即可求得其归一化后的概率,用P来表示,上述矩阵的每一列表示训练数据分别属于类别k的概率,每一列的和为1.

    下面计算Ground Truth 矩阵,该矩阵即代表了损失函数中的:,Ground Truth 矩阵为k*m的矩阵,每一列代表一个标签,该列中除第k行为1外,其他的元素均为0,k即为该列标签对应的值,比如对于K=4时,的四个样本:

    上图代表了,把上述矩阵扩展为k*m即可。用符号G来表示该矩阵,即可得到下面的cost function:

    下面需要对损失函数求导,来得到:

    
egin{align}

abla_{	heta_j} J(	heta) = - frac{1}{m} sum_{i=1}^{m}{ left[ x^{(i)} ( 1{ y^{(i)} = j}  - p(y^{(i)} = j | x^{(i)}; 	heta) ) 
ight]  } + lambda 	heta_j
end{align}

    以上公式得到一个k*n的矩阵,每一列即为对参数的导数。

    接下来梯度检验,验证上一句的正确性,若正确,则用L-BFGS求解最优解,直接用最优解来进行预测即可。下面是matlab代码:

    %% STEP 0: 初始化参数与常量
    %
    %  Here we define and initialise some constants which allow your code
    %  to be used more generally on any arbitrary input. 
    %  We also initialise some parameters used for tuning the model.
    
    inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28)
    numClasses = 10;     % Number of classes (MNIST images fall into 10 classes)
    
    lambda = 1e-4; % Weight decay parameter
    %%======================================================================
    %% STEP 1: Load data
    %
    %  In this section, we load the input and output data.
    %  For softmax regression on MNIST pixels, 
    %  the input data is the images, and 
    %  the output data is the labels.
    %
    
    % Change the filenames if you've saved the files under different names
    % On some platforms, the files might be saved as 
    % train-images.idx3-ubyte / train-labels.idx1-ubyte
    
    images = loadMNISTImages('mnist/train-images-idx3-ubyte');
    labels = loadMNISTLabels('mnist/train-labels-idx1-ubyte');
    labels(labels==0) = 10; % 注意下标是1-10,所以需要 把0映射到10
    
    inputData = images;
    
    % For debugging purposes, you may wish to reduce the size of the input data
    % in order to speed up gradient checking. 
    % Here, we create synthetic dataset using random data for testing
    
    DEBUG = true; % Set DEBUG to true when debugging.
    if DEBUG
        inputSize = 8;
        inputData = randn(8, 100);%randn产生每个元素均为标准正态分布的8*100的矩阵
        labels = randi(10, 100, 1);%产生1-10的随机数,产生100行,即100个标签
    end
    
    % Randomly initialise theta
    theta = 0.005 * randn(numClasses * inputSize, 1);
    
    %%======================================================================
    %% STEP 2: Implement softmaxCost
    %
    %  Implement softmaxCost in softmaxCost.m. 
    
    [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, inputData, labels);
                                         
    %%======================================================================
    %% STEP 3: Gradient checking
    %
    %  As with any learning algorithm, you should always check that your
    %  gradients are correct before learning the parameters.
    % 
    
    % h = @(x) scale * kernel(scale * x);
    % 构建一个自变量为x,因变量为h,表达式为scale * kernel(scale * x)的函数。即
    % h=scale* kernel(scale * x),自变量为x
    if DEBUG
        numGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ...
                                        inputSize, lambda, inputData, labels), theta);
    
        % Use this to visually compare the gradients side by side
        disp([numGrad grad]); 
    
        % Compare numerically computed gradients with those computed analytically
        diff = norm(numGrad-grad)/norm(numGrad+grad);
        disp(diff); 
        % The difference should be small. 
        % In our implementation, these values are usually less than 1e-7.
    
        % When your gradients are correct, congratulations!
    end
    
    %%======================================================================
    %% STEP 4: Learning parameters
    %
    %  Once you have verified that your gradients are correct, 
    %  you can start training your softmax regression code using softmaxTrain
    %  (which uses minFunc).
    
    options.maxIter = 100;
    softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...
                                inputData, labels, options);
                              
    % Although we only use 100 iterations here to train a classifier for the 
    % MNIST data set, in practice, training for more iterations is usually
    % beneficial.
    
    %%======================================================================
    %% STEP 5: Testing
    %
    %  You should now test your model against the test images.
    %  To do this, you will first need to write softmaxPredict
    %  (in softmaxPredict.m), which should return predictions
    %  given a softmax model and the input data.
    
    images = loadMNISTImages('mnist/t10k-images-idx3-ubyte');
    labels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte');
    labels(labels==0) = 10; % Remap 0 to 10
    
    inputData = images;
    
    % You will have to implement softmaxPredict in softmaxPredict.m
    [pred] = softmaxPredict(softmaxModel, inputData);
    
    acc = mean(labels(:) == pred(:));
    fprintf('Accuracy: %0.3f%%
    ', acc * 100);
    
    % Accuracy is the proportion of correctly classified images
    % After 100 iterations, the results for our implementation were:
    %
    % Accuracy: 92.200%
    %
    % If your values are too low (accuracy less than 0.91), you should check 
    % your code for errors, and make sure you are training on the 
    % entire data set of 60000 28x28 training images 
    % (unless you modified the loading code, this should be the case)
    
                              
    end                          
    %%%%对应STEP 2: Implement softmaxCost 
    function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)
    % numClasses - the number of classes 
    % inputSize - the size N of the input vector
    % lambda - weight decay parameter
    % data - the N x M input matrix, where each column data(:, i) corresponds to
    %        a single test set
    % labels - an M x 1 matrix containing the labels corresponding for the input data
    
    theta = reshape(theta, numClasses, inputSize);% 转化为k*n的参数矩阵
    
    numCases = size(data, 2);%或者data矩阵的列数,即样本数
    % M = sparse(r, c, v) creates a sparse matrix such that M(r(i), c(i)) = v(i) for all i.
    % That is, the vectors r and c give the position of the elements whose values we wish 
    % to set, and v the corresponding values of the elements
    % labels = (1,3,4,10 ...)^T 
    % 1:numCases=(1,2,3,4...M)^T
    % sparse(labels, 1:numCases, 1) 会产生
    % 一个行列为下标的稀疏矩阵
    % (1,1)  1
    % (3,2)  1
    % (4,3)  1
    % (10,4) 1
    %这样改矩阵填满后会变成每一列只有一个元素为1,该元素的行即为其lable k
    %1 0 0 ... 
    %0 0 0 ...
    %0 1 0 ...
    %0 0 1 ...
    %0 0 0 ...
    %. . .
    %上矩阵为10*M的 ,即 groundTruth 矩阵
    groundTruth = full(sparse(labels, 1:numCases, 1));
    cost = 0;
    % 每个参数的偏导数矩阵
    thetagrad = zeros(numClasses, inputSize);
    
    % theta(k*n) data(n*m)
    %theta * data = k*m , 第j行第i列为theta_j^T * x^(i)
    %max(M)产生一个行向量,每个元素为该列中的最大值,即对上述k*m的矩阵找出m列中每列的最大值
    
    M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); % 每列元素均减去该列的最大值,见图-
    M = exp(M); %求指数
    p = bsxfun(@rdivide, M, sum(M)); %sum(M),对M中的元素按列求和
    cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2);%损失函数值
    %groundTruth 为k*m ,data'为m*n,即theta为k*n的矩阵,n代表输入的维度,k代表类别,即没有隐层的
    %输入为n,输出为k的神经网络
    thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta; %梯度,为 k * 
    
    % ------------------------------------------------------------------
    % Unroll the gradient matrices into a vector for minFunc
    grad = [thetagrad(:)];
    end
    
    %%%%对应STEP 3: Implement softmaxCost 
    % 函数的实际参数是这样的J = @(x) softmaxCost(x, numClasses, inputSize, lambda, inputData, labels)
    % 即函数的形式参数J以x为自变量,别的都是以默认的值为相应的变量
    function numgrad = computeNumericalGradient(J, theta)
    % theta: 参数,向量或者实数均可
    % J: 输出值为实数的函数. 调用y = J(theta)将会返回函数在theta处的值
    
    % numgrad初始化为0,与theta维度相同
    numgrad = zeros(size(theta));
    EPSILON = 1e-4;
    % theta是一个行向量,size(theta,1)是求行数
    n = size(theta,1);
    %产生一个维度为n的单位矩阵
    E = eye(n);
    for i = 1:n
    	% (n,:)代表第n行,所有的列
    	% (:,n)代表所有行,第n列
    	% 由于E是单位矩阵,所以只有第i行第i列的元素变为EPSILON
        delta = E(:,i)*EPSILON;
    	%向量第i维度的值
        numgrad(i) = (J(theta+delta)-J(theta-delta))/(EPSILON*2.0);
    end
    
    
    %%%%对应STEP 4: Implement softmaxCost 
    function [softmaxModel] = softmaxTrain(inputSize, numClasses, lambda, inputData, labels, options)
    %softmaxTrain Train a softmax model with the given parameters on the given
    % data. Returns softmaxOptTheta, a vector containing the trained parameters
    % for the model.
    %
    % inputSize: the size of an input vector x^(i)
    % numClasses: the number of classes 
    % lambda: weight decay parameter
    % inputData: an N by M matrix containing the input data, such that
    %            inputData(:, i) is the ith input
    % labels: M by 1 matrix containing the class labels for the
    %            corresponding inputs. labels(c) is the class label for
    %            the cth input
    % options (optional): options
    %   options.maxIter: number of iterations to train for
    
    if ~exist('options', 'var')
        options = struct;
    end
    
    if ~isfield(options, 'maxIter')
        options.maxIter = 400;
    end
    
    % initialize parameters,randn(M,1)产生均值为0,方差为1长度为M的数组
    theta = 0.005 * randn(numClasses * inputSize, 1);
    
    % Use minFunc to minimize the function
    addpath minFunc/
    options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost
                              % function. Generally, for minFunc to work, you
                              % need a function pointer with two outputs: the
                              % function value and the gradient. In our problem,
                              % softmaxCost.m satisfies this.
    minFuncOptions.display = 'on';
    
    [softmaxOptTheta, cost] = minFunc( @(p) softmaxCost(p, ...
                                       numClasses, inputSize, lambda, ...
                                       inputData, labels), ...                                   
                                  theta, options);
    
    % Fold softmaxOptTheta into a nicer format
    softmaxModel.optTheta = reshape(softmaxOptTheta, numClasses, inputSize);
    softmaxModel.inputSize = inputSize;
    softmaxModel.numClasses = numClasses;
                              
    end
    
    %%%%对应 STEP 5: Implement predict 
    function [pred] = softmaxPredict(softmaxModel, data)
    
    % softmaxModel - model trained using softmaxTrain
    % data - the N x M input matrix, where each column data(:, i) corresponds to
    %        a single test set
    %
    % Your code should produce the prediction matrix 
    % pred, where pred(i) is argmax_c P(y(c) | x(i)).
     
    % Unroll the parameters from theta
    theta = softmaxModel.optTheta;  % this provides a numClasses x inputSize matrix
    pred = zeros(1, size(data, 2));
    
    %C = max(A)
    %返回一个数组各不同维中的最大元素。
    %如果A是一个向量,max(A)返回A中的最大元素。
    %如果A是一个矩阵,max(A)将A的每一列作为一个向量,返回一行向量包含了每一列的最大元素。
    %根据预测函数找出每列的最大值即可。
    [nop, pred] = max(theta * data);
    
    end
                 
    

      


  • 相关阅读:
    day26 案例源码
    重踏学习Java路上_Day26(网络编程)
    多线程面试题
    day24--多线程案例源码
    重踏学习Java路上_Day24(多线程锁,线程组,设计模式)
    多线程之join方法 (转)
    有return的情况下try catch finally的执行顺序(转)
    day23--电影院买票问题解决 同步代码块 同步方法 静态同步方法的引入
    进程和线程的概述--day23配套
    [学习笔记] kd-tree
  • 原文地址:https://www.cnblogs.com/ooon/p/5340759.html
Copyright © 2011-2022 走看看