zoukankan      html  css  js  c++  java
  • 机器学习(Machine Learning)- 吴恩达(Andrew Ng) 学习笔记(八)

    Neural Networks: Representation 神经网络

    Non-linear hypotheses 非线性分类器


    1. 考虑这个监督学习分类的问题,我们已经有了对应的训练集。

    1. Non-linear Hypotheses - example

    如果利用逻辑回归算法来解决这个问题,首先要构造一个包含很多非线性项的逻辑回归函数。如:(g(Theta_0 + Theta_1x_1 + Theta_2x_2 + Theta_3x_1x_2 + Theta_4x_1^2x_2 + Theta_5x_1^3x_2 + Theta_6x_1x_2^2 + ldots))。当多项式项数足够多时我们可能会得到一个分开正样本和负样本的分界线。

    1. Non-linear Hypotheses - example2


    2. 但是许多复杂的机器学习问题涉及的项数往往多于两项。当特征项数n很大时,除了存在运算量过大的问题外,找出附加项来建立一些分类器也是很困难的。

    Neurons and the brain



    The "one learning algorithm" hypothesis



    1. 将视神经的信号传到听觉皮层上,听觉皮层将学会“看”。
    2. 将视神经的信号传到躯体感觉皮层上,躯体感觉皮层将学会“看”。


    Model representation I


    Neuron in the brain


    3. Neuron in the brain - Neurons


    3. Neuron in the brain - Neurons2


    Neuron model: Logistic unit 神经元模型:逻辑单元



    通常只绘制(x_1,x_2,x_3)节点,有时会增加额外的(x_0)节点(偏置单位/偏置神经元),因(x_0 equiv 1)所以对例子无用时可省略。

    其中(h_Theta(x) = frac{1}{1 + e^{-Theta^Tx}}),通常(x)(Theta)是我们的参数向量,即(x = left[ egin{matrix} x_0 \ x_1 \ x_2 \x_3 end{matrix} ight])(Theta = left[ egin{matrix} Theta_0 \ Theta_1 \ Theta_2 \ Theta_3 end{matrix} ight])

    4. Logistic unit - Neuron model

    Neural Network



    5. Neural Network - Neuron model2



    1. (a_i^{(j)}) = “activation” of unit (i) in layer (j). 第(j)层的第(i)个神经元或单元的“激活函数”,"激活函数"是一个具体神经元,它读入计算并输出值。
    2. (Theta^{(j)}) = matrix of weights controlling function mapping from layer (j) to layer (j + 1). 从第(j)层映射到第(j + 1)层的权重矩阵。


    [a_1^{(2)} = g(Theta_{10}^{(1)}x_0 + Theta_{11}^{(1)}x_1 + Theta_{12}^{(1)}x_2 + Theta_{13}^{(1)}x_3) \ a_2^{(2)} = g(Theta_{20}^{(1)}x_0 + Theta_{21}^{(1)}x_1 + Theta_{22}^{(1)}x_2 + Theta_{23}^{(1)}x_3) \ a_3^{(2)} = g(Theta_{30}^{(1)}x_0 + Theta_{31}^{(1)}x_1 + Theta_{32}^{(1)}x_2 + Theta_{33}^{(1)}x_3) \ h_{Theta}(x) = a_1^{(3)} = g(Theta_{10}^{(2)}a_0^{(2)} + Theta_{11}^{(2)}a_1^{(2)} + Theta_{12}^{(2)}a_2^{(2)} + Theta_{13}^{(2)}a_3^{(2)}) ]

    If network has (s_j) units in layer (j), (s_{j+1}) units in layer (j + 1), then (Theta^{(j)}) will be of dimension (s_{j+1} imes (s_j + 1)). 如果第(j)层有(s_j)个单元,(j+1)层有(s_{j+1})个单元,则(Theta^{(j)})(s_{j+1} imes (s_j + 1))维的。(因为需考虑(x_0)结点)如:

    [X = left[ egin{matrix} x_1 \ x_2 \ x_3 end{matrix} ight], a = left[ egin{matrix} a_1 \ a_2 \ a_3 end{matrix} ight], Theta = left[ egin{matrix} Theta_{10} Theta_{11} Theta_{12} Theta_{13} \ Theta_{20} Theta_{21} Theta_{22} Theta_{23} \ Theta_{30} Theta_{31} Theta_{32} Theta_{33} \ end{matrix} ight]. ]

    Model representation II

    1. 如何高效的进行计算以及向量化的实现方法。
    2. 为什么这样表示神经网络是一个好的方法及它们怎样帮助我们学习复杂的非线性假设。

    Forward propagation: Vectorized implementation


    [z_1^{(2)} = Theta_{10}^{(1)}x_0 + Theta_{11}^{(1)}x_1 + Theta_{12}^{(1)}x_2 + Theta_{13}^{(1)}x_3 \ z_2^{(2)} = Theta_{20}^{(1)}x_0 + Theta_{21}^{(1)}x_1 + Theta_{22}^{(1)}x_2 + Theta_{23}^{(1)}x_3 \ z_3^{(2)} = Theta_{30}^{(1)}x_0 + Theta_{31}^{(1)}x_1 + Theta_{32}^{(1)}x_2 + Theta_{33}^{(1)}x_3 ]


    (X = left[ egin{matrix} x_0 \ x_1 \ x_2 \ x_3 end{matrix} ight])(z^{(2)} = left[ egin{matrix} z_1^{(2)} \ z_2^{(2)} \ z_3^{(3)} end{matrix} ight]),令(z^{(2)} = Theta^{(1)}x)(a^{(2)} = g(z^{(2)})),因(a_0^{(2)} = 1)

    (z^{(3)} = Theta^{(2)}a^{(2)})(h_{Theta}(x) = a^{(3)} = g(z^{(3)}))

    这个计算(X)的过程也称为Forward propagation(向前传播)。

    Neural Network learning its own features


    6. Neural Network learning its own features - Model

    Other network architectures 其它神经网络的架构



    7. Other network architectures - Model

    Examples and intuitions I

    Simple example: AND

    (x_1,x_2 in {0,1}), (y = x_1) AND (x_2).

    1. 搭建神经网络

      8. Examples and intuitions - AND

    2. 分配权重/参数

      8. Examples and intuitions - AND2

    3. 写出表达式:(h_{Theta}(x) = g(-30 + 20x_1 + 20x_2))

    4. 观察结果

      8. Examples and intuitions - AND3

    Example: OR function

    8. Examples and intuitions - OR

    Examples and intuitions II


    (h_{Theta}(x) = g(10 - 20x_1))

    8. Examples and intuitions - NOT

    Putting it together: (x_1) XNOR (x_2)

    1. 三种操作的汇总:8. Examples and intuitions - together
    2. XNOR操作的解释:(x_1) XNOR (x_2) = NOT((x_1) XOR (x_2) )
    3. 实现8. Examples and intuitions - together2

    Neural Network intuition

    When you have multiple layers you have relatively simple function of the inputs of the second layer. But the third layer I can build on that to complete even more complex functions, and then the layer after that can compute even more complex functions.

    8. Examples and intuitions - Neural Network intuition

    当有多层时,第二层输入的功能就相对简单了。 但是我可以在此基础上建立第三层来计算更加复杂的函数,然后再下一层又可以计算出再复杂一些的函数。

    Multiclass Classification 多分类

    Multiple output units: One-vs-all. 多输出单位:一对多模型





    1. 因为此时神经网络的输出是一个四维向量,所以输出需要用一个向量来表示。我们用第一个元素表示图上是不是一个行人,第二个元素表示图上是不是一辆汽车,以此类推。因此,当图上是一个行人时我们希望输出(left[ egin{matrix} 1 \ 0 \ 0 \ 0 end{matrix} ight]),是一辆汽车时希望输出(left[ egin{matrix} 0 \ 1 \ 0 \ 0 end{matrix} ight]),……。
    2. 之前我们把(y)写作一个整数,用1,2,3,4来表示。但在这个例子中,当我们要表征一个具有行人、汽车、摩托车和卡车这样四个不同图片作为元素的训练集时,我们要用(left[ egin{matrix} 1 \ 0 \ 0 \ 0 end{matrix} ight])(left[ egin{matrix} 0 \ 1 \ 0 \ 0 end{matrix} ight]),……来表示。
    3. 我们的训练样本要用((x^{(i)},y^{(i)}))来表示,其中(x^{(i)})表示我们已知的四种物体图像中的一个,而(y^{(i)})是这四个向量中的某一个。我们希望通过找到某种方法,让我们的神经网络输出某个值,因此(h(x) approx y)

    9. Multiclass Classification - One-vs-all



    1. Which of the following statements are true? Check all that apply.

      • [ ] A two layer (one input layer, one output layer; no hidden layer) neural network can represent the XOR function.

      • [ ] Suppose you have a multi-class classification problem with three classes, trained with a 3 layer network. Let (a^{(3)}_1 = (h_Theta(x))_1) be the activation of the first output unit, and similarly (a^{(3)}_2 = (h_Theta(x))_2) and (a^{(3)}_3 = (h_Theta(x))_3). Then for any input (x), it must be the case that (a^{(3)}_1 + a^{(3)}_2 + a^{(3)}_3 = 1).

      • [x] The activation values of the hidden units in a neural network, with the sigmoid activation function applied at every layer, are always in the range (0, 1).

      • [x] Any logical function over binary-valued (0 or 1) inputs (x_1) and (x_2) can be (approximately) represented using some neural network.

    2. Consider the following neural network which takes two binary-valued inputs (x_1, x_2 in {0, 1}) and outputs (h_Theta(x)). Which of the following logical functions does it (approximately) compute?


      • [x] OR

      • [ ] AND

      • [ ] NAND (meaning "NOT AND")

      • [ ] XOR (exclusive OR)

    3. Consider the neural network given below. Which of the following equations correctly computes the activation (a_1^{(3)})? Note: (g(z)) is the sigmoid activation function.


      • [x] (a^{(3)}_1=g(Theta^{(2)}_{1,0}a^{(2)}_0+Theta^{(2)}_{1,1}a^{(2)}_1+Theta^{(2)}_{1,2}a^{(2)}_2))
      • [ ] (a^{(3)}_1=g(Theta^{(2)}_{1,0}a^{(1)}_0+Theta^{(2)}_{1,1}a^{(1)}_1+Theta^{(2)}_{1,2}a^{(1)}_2))
      • [ ] (a^{(3)}_1=g(Theta^{(1)}_{1,0}a^{(2)}_0+Theta^{(1)}_{1,1}a^{(2)}_1+Theta^{(1)}_{1,2}a^{(2)}_2))
      • [ ] (a^{(3)}_1=g(Theta^{(2)}_{2,0}a^{(2)}_0+Theta^{(2)}_{2,1}a^{(2)}_1+Theta^{(2)}_{2,2}a^{(2)}_2))
    4. You have the following neural network:


      You'd like to compute the activations of the hidden layer (a^{(2)} in R^3). One way to do so is the following Octave code:


      You want to have a vectorized implementation of this (i.e., one that does not use for loops). Which of the following implementations correctly compute (a^{(2)})? Check all that apply.

      • [x] a2 = sigmoid (Theta1 * x);
      • [ ] a2 = sigmoid (x * Theta1);
      • [ ] a2 = sigmoid (Theta2 * x);
      • [ ] z = sigmoid(x); a2 = Theta1 * z;
    5. You are using the neural network pictured below and have learned the parameters (Theta^{(1)} = left[ egin{matrix} 1 & 2.1 & 1.3 \ 1 & 0.6 & -1.2 end{matrix} ight]) (used to compute (a^{(2)}) and (Theta^{(2)} = left[ egin{matrix} 1 & 4.5 & 3.1 end{matrix} ight]) (used to compute (a^{(3)}) as a function of (a^{(2)}). Suppose you swap the parameters for the first hidden layer between its two units so (Theta^{(1)} = left[ egin{matrix} 1 & 0.6 & -1.2 \ 1 & 2.1 & 1.3 end{matrix} ight]) and also swap the output layer so (Theta^{(2)} = left[ egin{matrix} 1 & 3.1 & 4.5 end{matrix} ight]). How will this change the value of the output (h_Theta(x))?


      • [x] It will stay the same. 同时交换隐藏层的两个隐藏单元和(Theta^{(1)})的两行(Theta^{(2)})的后两列,整体不变。
      • [ ] It will increase.
      • [ ] It will decrease
      • [ ] Insufficient information to tell: it may increase or decrease.


    1. lrCostFunction.m

      J = 1 / m * ( -y' * log(sigmoid( X * theta )) - (1 - y)' * log(1 - sigmoid( X * theta ))) + lambda / (2 * m) * (theta' * theta - theta(1)^2);
      grad = 1 / m * (X' * (sigmoid(X * theta) - y));
      temp = theta;
      temp(1) = 0;
      grad = grad + lambda / m * temp;
    2. oneVsAll.m

      initial_theta = zeros(n + 1, 1);
      options = optimset('GradObj', 'on', 'MaxIter', 50);
      for c = 1:num_labels
      	all_theta(c, :) = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options);
    3. predictOneVsAll.m

      A = sigmoid(X * all_theta');
      [value, index] = max(A, [], 2);
      p = index;
    4. predict.m

      a1 = [ones(m, 1) X];
      z2 = a1 * Theta1';
      a2 = sigmoid(z2);
      a2 = [ones(size(z2, 1), 1) a2];
      z3 = a2 * Theta2';
      a3 = sigmoid(z3);
      [value, index] = max(a3, [], 2);
      p = index;
  • 相关阅读:
    自动化总结(三) Unittest的应用2
    自动化总结(二) Unittest的应用
  • 原文地址:https://www.cnblogs.com/songjy11611/p/12261781.html
Copyright © 2011-2022 走看看