zoukankan      html  css  js  c++  java
  • 机器学习 coursera【week4-6】

    week4 neural networks I

    4.1definitions of neural networks

    4.1.1dimensions of neural networks(how to define the hypothesis of neural networks)

    input层(层1)有3个神经元,隐藏层(层2)有4个神经元,那么这个矩阵是S2x(S1+1),计算:层1为3个neutrons,层2有4个neutrons。

    重新定义z函数,压缩到h函数中

    神经网络中的真值表

    4.1.2application of Neutral networks

    XNOR operator using a hidden layer((X1 XNOR X2) = ((X1 AND X2) AND ((NOT X1) AND (NOT X2)) AND (X1 OR X2)))

    模型的输入与输出

    4.2homeworkweek04

    4.2.1载入数据集

    load('ex3data1.mat'); % training data stored in arrays X, y

    载入数据集运行后,可以得到如下手写数字集

    in the process of vectorization, we can use the equation that is aTb=bTa, so 

    week5 neural networks II

    5.1the difference of cost function between logistic regression and neural networks

    5.2the algorithm of optimization for cost function

    the cost function for regularized logistic regression is

    for neural networks, it is

    k代表输出单元数/层数

    i代表特征数:0-m

    j代表网络层数

    L代表网络总层数

    SL代表网络在第l层的单元数量(sl = number of units (not counting bias unit) in layer l

    Note:

    • the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
    • the triple sum simply adds up the squares of all the individual Θs in the entire network.
    • the i in the triple sum does not refer to training example i

    5.3backpropogation

    definition: backpropogation is a termonology to minimize the cost function

    backpropogation algorithm 反向传播算法

    in order to minimize the cost function, then we gradually do this

    gradient computation

    误差值计算

    公式

    第l层激活函数导数

    累加器D

    当j=0是,即为biases

    当j is not 0,即为the gradient of cost function to the weights

    backpropogation的进一步解释,联系权重,对输出值的偏导,概率值等。

    误差=y值-概率值

    误差=cost对输出值的偏导,cost=y*log(h)+(1-y)*log(1-h)

    5.4backpropogation

    In order to use the optimization function, such as fminunc, we should convert the matrix of theta 1-3 to matrix of DVec.

    how to convert the matrix of theta to the matrix of Digital D, and  reshape the matrix of thetavec

    use forward propogation and backword propogation, we can get the data of theta, thetaVec, and J(theta) which is cost function or gradientVec

    5.5近似计算cost对权值的倒数

    计算cost对权值导数可以求得thetaVec,即累加器

    对于2个向量和n个向量的方程分别如下,

    对应于n个向量,在octave中,代码如下,

    epsilon = 1e-4;
    for i = 1:n,
      thetaPlus = theta;
      thetaPlus(i) += epsilon;
      thetaMinus = theta;
      thetaMinus(i) -= epsilon;
      gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)
    end;

    这样就得到了n次迭代后,累加器值。

    5.6神经网络知识点汇总

    5.6.1选择网络结构

    5.6.2训练神经网络

    5.7homework

    5.7.1visualizing the data for the neural network

    5.7.2model representation

    load matrix of weights of theta1 and theta2 to compute the a and z

    5.7.3feedforward and cost function

    zeros(size(theta1)) 设置theta1矩阵所有变量为0

    ones(m,1)表示m行1列的全为1的元素,比如ones(8,1)这个表示8行1列全为1的元素

    eye()产生mxn的单位矩阵

    %m = size(X, 1);
    %part1 计算假设函数
    a1 = [ones(m,1) X]; %对X添加1列,也可写成 a1 = [ones(size(X,1),1) X]
    z2 = a1 * Theta1'; %5000x25
    a2 = sigmoid(z2);
    a2 = [ones(size(a2,1),1) a2];
    z3 = a2 * Theta2';
    a3 = sigmoid(z3);
    h = a3; %5000x10

    %y为mx1,需要改成mx10
    %先创建一个mxn的10x10的矩阵,再修改m和n值
    y_mn = eye(num_labels);
    y = y_mn(y,:);

    %计算无正则项的cost function
    %向量乘向量需要点乘 .*
    J = (-1/m) * sum(sum(y.*log(h)+(1-y).*log(1-h)));

    计算结果如下:

    Feedforward Using Neural Network ...
    Cost at parameters (loaded from ex4weights): 0.287629
    (this value should be about 0.287629)

    5.7.4regularized cost function

    什么时候用点乘 " .* ":如果求下来,结果是矩阵,这个时候用点乘;相当于矩阵对应每个元素之间做运算

    什么时候用乘 " * ":如果求下来,结果是一个数,这个时候用乘。

    两次sum函数之后,结果就是一个数

    对于这里的regularized cost function,采用两种做法,第一种直接算,不在theta里面采用向量运算,第二种在theta里面采用向量运算

    %方法1
    sum1 = 0;
    sum2 = 0;
    for i = 1 : size(Theta1,1)
      sum1 += Theta1(i,:) * Theta1'(:,i) - sum(Theta1(i,1).^2);
    endfor
    for j=1:size(Theta2,1)
      sum2 += Theta2(j,:) * Theta2'(:,j) - sum(Theta2(j,1).^2);
    endfor
    J += lambda/(2*m) * (sum1+sum2);
    
    %方法2
    ##regularization = lambda/(2*m) * (sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));
    ##J +=  regularization;

    backpropagation

    5.7.5 sigmoid gradient

    at first, the sigmoid gradient should be finished.

    g = sigmoid(z) .* (1 - sigmoid(z));

    so we can get a vector with five sigmoid value

    Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:
    0.196612 0.235004 0.250000 0.235004 0.196612

    5.7.6random initialization

    设置好初始的epsilon值,代入equation中,得到权值矩阵

    epsilon_init = 0.12;
    W = rand(L_out, 1 + L_in) * 2 * epsilon_init -  epsilon_init;

    5.7.7update backpropagation

    there are five steps to do this part:

    1.add one raw to a1 to match the size of Theta1;

    2.then to compute the delta3,

    3.continously compute the delta2(hidden layer) according to the delta3(output layer)

    4.starting from 2 of the delta to the end, just skipping the delta0

    5.obtain the unregularized gradient for the neural network cost function by dividing the m which is the examples of tasks

    X = [ones(m,1) X] %5000x401
    for t = 1:m
      a1 = X(t,:); %1x401
      a1 = a1'; %401x1
      z2 = Theta1 * a1;  %25x1
      a2 = sigmoid(z2); 
      a2 = [1 ; a2];  %26X1
      z3 = Theta2 * a2; %10X1
      a3 = sigmoid(z3); %10x1
      h = a3; %10x1
    
      %y为5000x1,需要改成5000x10
      %先创建一个mxn的10x10的矩阵,再修改m和n值
      y_temp = zeros(num_labels,1); %10x1
      y_temp(y(t)) = 1; 
      J += (-1/m) * sum(sum(y_temp.*log(h)+(1-y_temp).*log(1-h)));
    
    %backpropagation
      delta3 = a3 - y_temp; %10x1
      delta2 = Theta2(:,2:end)' * delta3 .* sigmoidGradient(z2); %25x1 x 25x1 =25x1
      Theta1_grad += delta2 * a1'; %25x401
      Theta2_grad += delta3 * a2'; %10x26
    endfor
    
    Theta1_grad = Theta1_grad / m;
    Theta2_grad = Theta2_grad / m;

    5.7.7gradient check

    5.7.8regularized neural networks

    Theta1(:,1) = 0;
    Theta2(:,1) = 0;
    Theta1_grad += lambda/m * Theta1;
    Theta2_grad += lambda/m * Theta2;

    week6 how to optimize a machine learning algorithm

    6.1evaluate a learning algorithm

    in order to evaluate the algorithm, we split up the data into two sets that are a training set(70%) and a testing set(30%).

    on the process of test, we set error.

    for linear regression, it is 

    for classification/multiclassification, it is

    and the err(h) could be divided into two parts, they are,

    6.2model selection and train, validation, test sets

    cross validation - CV

    6.3the difference for setting high bias and high variance

    if J_train is much less than J_cv, it is possible that a learning algorithm is suffering from high bias and getting more train data/decreasing the hidden layer which is means to make the model less difficult is unlikely to help much. Additionally, it is possible that a learning algorithm is suffering from high variance and getting more train data/decreasing the hidden layer which is means to make the model less difficult is likely to help.

    6.4debugging a learning algorithm

    减小算法误差

    getting more training examples == trying smaller sets of features == a neural network with fewer parameters == increasing lambda are equal to make the neural network less complex to underfit the data/computationally cheaper;

    adding features == adding polynomial features == a large neural network with more parameters == decreasing lambda are equal to make the neural network more complex to overfit the data/computationally expensive.

    6.5homework

    6.5.1linear regression cost function

    regularization = lambda/(2*m) * sum(theta(2:end).^2);
    J = 1/(2*m) * sum((X*theta-y)'*(X*theta-y)) + regularization;

    the initialized result is 303.99 and the plotdata is below

    6.5.2regularization gradient

    in this part, regularization gradient should be finished.

    the equation is defined by that

    grad = (X'*(X*theta-y))/m;
    grad(2:end) = grad(2:end) + lambda*theta(2:end)/m;

    and the result is Gradient at theta = [1 ; 1]:  [-15.303016; 598.250744]

    6.5.3fitting linear regression

    visualize the data or in the next section a function can be used to generate learning curves.

    6.5.4bias-variance

    high bias causes underfit and high variance causes overfit.

    using the data as followed to plot the figure from 1 to i

    for i = 1:m
      theta = trainLinearReg(X(1:i,:),y(1:i),0);
      error_train(i) = linearRegCostFunction(X(1:i,:),y(1:i),theta,0);
      error_val(i) = linearRegCostFunction(Xval,yval,theta,0);
    endfor

    6.5.5ploy features

    put the multi-features into the X_poly, so we can get a matrix of [m x p] instead of [m x 1]. It can allow us to add more features into the X rather than just one value.

    the function maps the original training set X of size mx1 into its higher powers. Speci cally, when a training set X of size mx1 is passed into the function, the function should return a mxp matrix X poly.

    where column 1 holds the original values of X, column 2 holds the values of X.^2, column 3 holds the values of X.^3, and so on.

    for i=1:p
      X_poly(:,i) = X.^i;
    endfor

    X(i).^2与X.^2的区别:

    X(i).^2表示某一行内积平方;

    X.^2表示这个向量内积平方。

    6.5.6learning Polynomial regression

    设置完参数后,这里就会出现过拟合现象,因为training error is low, but cross validation is high, so in this case, the model is in a situation of high variance.

    polynomial fit

    polynomial learning curve

    6.5.6adjust the regularization parameter

    when the lambda is 1, the data are matched well.

    and the cross validation error is obviously decreased.

    when the lambda is 100, the figure is underfitted and it is situated in the stage of high bias.

    6.5.7select a lambda to use a cross validation set

    在调用trainLinearReg函数时,不能写成theta = trainLinearReg(X(1:i,:),y(1:i),lambda);

    因为这代表在传参数时就部分传入,其实应该在传入时,整体传入,处理时,部份处理写成X(1:i,:), y(1:i);

    同时在训练时,对累加数据进行训练,在交叉验证时,对所有数据进行处理。

    正则化的使用情况是:对overfitting的figure做处理;所以在计算cost function = train_error/validation_error时,就不用再次正则化,因为需要控制变量。

    一方面需要看lambda对曲线的影响,如果加上正则化,就是两个因素影响曲线,所以这里取消正则化,即设置lambda=0

    for i = 1:length(lambda_vec)
      lambda = lambda_vec(i);
      theta = trainLinearReg(X,y,lambda);
      error_train(i) = linearRegCostFunction(X,y,theta,0)
      error_val(i) = linearRegCostFunction(Xval,yval,theta,0)
    endfor

    get the parameter of lambda=3 which is the minimun of error of cross validation

    6.6building a spam classifier

    6.6.1prioritizing what to work on

    the recommended method to solve the ML problem 

    quick and dirty implementation to get the result

    then a optimization can be done with a more accurated direction

    6.6.2Error Metrics for Skewed Classes

    通过precision and recall, 用来优化算法,即使有极限扭曲的现象。

    6.6.3Trade off precision and recall

    pick up a formula to compute the F score which is equal F = (P*R) / (P+R), instead F = (P + R) / 2

    通过F值计算来考察算法的优劣,precision应稍高,recall应略低

    6.7using large data set

    design a high accuracy learning system

    large data rationale, there are three points to optimise

    6.8 experiment

    关于计算predict 1的accuracy,precision and recall,有练习和公式

  • 相关阅读:
    12、SpringBoot-CRUD增加数据
    12、SpringBoot-CRUD增加数据
    Cache操作类
    pythonhttp
    python学习
    自动化测试LoadRunner
    pythonweb自动化测试
    python学习工具篇
    python学习
    自动化测试之python安装
  • 原文地址:https://www.cnblogs.com/yuyukun/p/12273482.html
Copyright © 2011-2022 走看看