zoukankan      html  css  js  c++  java
  • UFLDL教程之(一)sparseae_exercise

    下面,将UFLDL教程中的sparseae_exercise练习中的各函数及注释列举如下

    首先,给出各函数的调用关系

    主函数:train.m

    (1)调用sampleIMAGES函数从已知图像中扣取多个图像块儿

    (2)调用display_network函数,以网格的形式,随机显示多个扣取的图像块儿

    (3)梯度校验,该部分的目的是测试函数是否正确,可以由单独的函数checkSparseAutoencoderCost实现

           ①利用sparseAutoencoderCost函数计算网路的代价函数和梯度值

           ②利用computeNumericalGradient函数计算梯度值(这里,要利用checkNumericalGradient函数验证该梯度计算函数是否正确)

           ③比较①和②的梯度计算结果,判断编写的sparseAutoencoderCost函数是否正确

           如果sparseAutoencoderCost函数是正确的,那么,在实际训练中,不需要运行checkSparseAutoencoderCost

    (4)利用L-BFGS方法对网络进行训练,从而得到最优化的网络的权值和偏执项

    (5)对训练结果进行可视化

    然后,对个函数给出注释

    train.m

    %% CS294A/CS294W Programming Assignment Starter Code
    
    addpath ..common
    
    %%======================================================================
    %% STEP 0: Here we provide the relevant parameters values that will
    %  allow your sparse autoencoder to get good filters; you do not need to change the parameters below.
    visibleSize = 8*8;   % number of input units
    hiddenSize = 25;     % number of hidden units
    sparsityParam = 0.01;   % desired average activation of the hidden units.
    % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", in the lecture notes).
    lambda = 0.0001;     % weight decay parameter
    beta = 3;            % weight of sparsity penalty term
    
    
    %%======================================================================
    %% STEP 1: Implement sampleIMAGES 
    %  After implementing sampleIMAGES, the display_network command should display a random sample of 200 patches from the dataset
    %从图像中提取图像块儿,每一个提取到的图像块儿存放在patches的每一列中
    patches = sampleIMAGES; 
    %随机提取patches中的200列,然后显示这200列所对应的图像
    IMG=patches(:,randi(size(patches,2),200,1)); 
    display_network(IMG,8);
    
    %%======================================================================
    %% STEP 2 and STEP 3:Implement sparseAutoencoderCost and Gradient Checking
    checkSparseAutoencoderCost()
    
    %%======================================================================
    %% STEP 4: After verifying that your implementation of
    
    %  Randomly initialize the parameters
    theta = initializeParameters(hiddenSize, visibleSize);
    
    %  Use minFunc to minimize the function
    addpath minFunc/
    options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost function
    % Generally, for minFunc to work, you need a function pointer with two outputs: the function value and the gradient. 
    % In our problem, sparseAutoencoderCost.m satisfies this.
    options.maxIter = 400;      % Maximum number of iterations of L-BFGS to run
    options.display = 'on';
    
    % opttheta是整个神经网络的权值和偏执项构成的向量
    [opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
        visibleSize, hiddenSize, ...
        lambda, sparsityParam, ...
        beta, patches), ...
        theta, options);
    
    %%======================================================================
    %% STEP 5: Visualization
    W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);%第一层的权值矩阵
    display_network(W1', 12);
    
    print -djpeg weights.jpg   % save the visualization to a file

    checkSparseAutoencoderCost.m

    %% 该函数主要目的是检验SparseAutoencoderCost函数是否正确
    function checkSparseAutoencoderCost()
    
    %% 产生一个稀疏自编码网络(可以与主程序相同,也可以重新产生)
    visibleSize = 8*8;   % number of input units
    hiddenSize = 25;     % number of hidden units
    sparsityParam = 0.01;   % desired average activation of the hidden units.
    % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", in the lecture notes).
    lambda = 0.0001;     % weight decay parameter
    beta = 3;            % weight of sparsity penalty term
    
    patches = sampleIMAGES; 
    
    %  Obtain random parameters theta
    theta = initializeParameters(hiddenSize, visibleSize);
    
    %% 计算代价函数和梯度  
    [cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, lambda, ...
            sparsityParam, beta, patches(:,1:10));
        
    %% 利用近似方法计算梯度(要调用自编码器的代价函数计算程序)
        numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...
            hiddenSize, lambda, ...
            sparsityParam, beta, ...
            patches(:,1:10)), theta);
    
    %% 比较cost函数计算得到的梯度和由近似得到的梯度之
    % Use this to visually compare the gradients side by side
    disp([numgrad grad]);
        
    % Compare numerically computed gradients with the ones obtained from backpropagation
    diff = norm(numgrad-grad)/norm(numgrad+grad);
    disp(diff); % Should be small. In our implementation, these values are usually less than 1e-9.
        
    end

    sparseAutoencoderCost.m

    %% 计算网络的代价函数和梯度
    function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...
                                                                           lambda, sparsityParam, beta, data)
    
    % visibleSize: the number of input units (probably 64)
    % hiddenSize: the number of hidden units (probably 25)
    % lambda: weight decay parameter
    % sparsityParam: The desired average activation for the hidden units (denoted in the lecture
    %                           notes by the greek alphabet rho, which looks like a lower-case "p").
    % beta: weight of sparsity penalty term
    % data: Our 64x10000 matrix containing the training data.  So, data(:,i) is the i-th training example.
    
    % The input theta is a vector (because minFunc expects the parameters to be a vector).
    % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
    % follows the notation convention of the lecture notes.
    
    W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
    W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);
    b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
    b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);
    
    % Cost and gradient variables (your code needs to compute these values).
    % Here, we initialize them to zeros.
    cost = 0;
    
    m=size(data,2);
    
    %% ---------- YOUR CODE HERE --------------------------------------
    %  Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,
    %                and the corresponding gradients W1grad, W2grad, b1grad, b2grad.
    %
    % W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.
    % Note that W1grad has the same dimensions as W1, b1grad has the same dimensions
    % as b1, etc.  Your code should set W1grad to be the partial derivative of J_sparse(W,b) with
    % respect to W1.  I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b)
    % with respect to the input parameter W1(i,j).  Thus, W1grad should be equal to the term
    % [(1/m) Delta W^{(1)} + lambda W^{(1)}] in the last block of pseudo-code in Section 2.2
    % of the lecture notes (and similarly for W2grad, b1grad, b2grad).
    %
    % Stated differently, if we were using batch gradient descent to optimize the parameters,
    % the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2.
    %
    
    
    %% 前向传播算法
    a1=data;
    z2=bsxfun(@plus,W1*a1,b1);
    a2=sigmoid(z2);
    z3=bsxfun(@plus,W2*a2,b2);
    a3=sigmoid(z3);
    
    %% 计算网络误差
    % 误差项J1=所有样本代价函数均值
    y=data; % 网络的理想输出值
    Ei=sum((a3-y).^2)/2; %每一个样本的代价函数
    J1=sum(Ei)/m;
    % 正则化项J2=所有权值项平方和
    J2=sum(W1(:).^2)+sum(W2(:).^2);
    % 稀疏项J3=所有隐藏层的神经元相对熵之和
    rho_hat=sum(a2,2)/m; 
    KL=sum(sparsityParam*log(sparsityParam./rho_hat)+...
          (1-sparsityParam)*log((1-sparsityParam)./(1-rho_hat)));
    J3=KL;
    % 网络的代价函数
    cost=J1+lambda*J2/2+beta*J3;
    
    
    %% 反向传播算法计算各层敏感度delta
    delta3=-(data-a3).*dsigmoid(z3);
    spare_delta=beta*(-sparsityParam./rho_hat+(1-sparsityParam)./(1-rho_hat));
    delta2=bsxfun(@plus,W2'*delta3,spare_delta).*dsigmoid(z2); % 这里加入了稀疏项 
    
    %% 计算代价函数对各层权值和偏执项的梯度
    W1grad=delta2*a1'/m+lambda*W1;
    W2grad=delta3*a2'/m+lambda*W2;
    b1grad=sum(delta2,2)/m;
    b2grad=sum(delta3,2)/m;
    
    %-------------------------------------------------------------------
    % After computing the cost and gradient, we will convert the gradients back
    % to a vector format (suitable for minFunc).  Specifically, we will unroll
    % your gradient matrices into a vector.
    
    grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];
    %
    end
    
    %-------------------------------------------------------------------
    % Here's an implementation of the sigmoid function, which you may find useful
    % in your computation of the costs and the gradients.  This inputs a (row or
    % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)).
    
    function sigm = sigmoid(x)
    sigm = 1 ./ (1 + exp(-x));
    end
    
    %% 求解sigmoid函数的导数(这里的计算公式一定要注意啊,出过错)
    function dsigm = dsigmoid(x)
    sigx = sigmoid(x);
    dsigm=sigx.*(1-sigx);
    end

    梯度检验函数见另一篇博文

      

      

      

    ------------------------------------------------------------------------------------------------------------------------------- 博主为菜鸟一枚,发表博客的主要目的是为了记录科研中的点滴,方便自己以后查阅,如果有错误的地方,还请大家多提宝贵意见,如果有何侵犯到其他博主的内容,还请告知博主,将会及时处理! 另外,对于未标注转载的文章,均为博主自己整理,如需转载,请注明出处,谢谢!
  • 相关阅读:
    关于PHPExcel批处理及json数据处理
    PHP中一些通用和易混淆技术点的最佳编程实践
    PHP采集远程图片
    直接拿来用!最火的Android开源项目
    C#在FORM页面上将excel表格从SQL数据库导出,导入txt格式表格
    SAPABAP編輯器 快速代码提示功能(自動提示)
    SAPABAP编辑器 快速代码提示功能(自动提示)
    ABAP 声明内表,内表包含内表
    ABAP ALV新增可维护的新行
    自底向上,数字化转型的实践和思考
  • 原文地址:https://www.cnblogs.com/lutingting/p/4759755.html
Copyright © 2011-2022 走看看