05 Neural Networks

zoukankan html css js c++ java

05 Neural Networks
Neural Networks

The ‘one learning algorithm’ hypothesis
1. Neuron-rewiring experiments
Model Representation

Define
1. Sigmoid(logistic) activation function
2. bias unit
3. input layer
4. output layer
5. hidden layer
6. (a_i^{(j)}) : ‘activation’ of unit (i) in layer (j)
7. ( heta^{(j)}): matrix of weights controlling function mapping from layer (j) to layer (j + 1).
Calculate

[a^{(j)} = g(z^{(j)})]
[g(x) = frac{1}{1 + e^{-x}}]
[z^{(j + 1)} = Theta^{(j)}a^{(j)}]
[h_ heta(x) = a^{(j + 1)} = g(z^{(j + 1)})]

Cost Function

[
J(Theta) = - frac{1}{m} sum_{i=1}^m sum_{k=1}^K left[y^{(i)}_k log ((h_Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)log (1 - (h_Theta(x^{(i)}))_k) ight] + ]

[frac{lambda}{2m}sum_{l=1}^{L-1} sum_{i=1}^{s_l} sum_{j=1}^{s_{l+1}} ( Theta_{j,i}^{(l)})^2
]

Back-propagation Algorithm

Algorithm
1. Hypothesis we have calculated all the (a^{(l)}) and (z^{(l)})
2. set (Delta^{(l)}_{i, j} := 0) for all (l, i, j)
3. using (y^{(t)}), compute (delta^{L} = a^{(L)} - y^{(t)}), where (y^{(t)}_{k}(i) in {0, 1}) indicates whether the current training example belongs to class k{(y^{(t)}_{k}(k) = 1)}, or if it belongs to a different class = 0;
4. For the hidden layer (l = L - 1) down to 2, set
  [
  delta^{(l)} = (Theta^{(l)})^Tdelta^{(l + 1)} .* g’(z^{(l)})
  ]
5. remember remove (delta_0^{(l)}) by. delta(2:end)
  [
  Delta^{(l)} = Delta^{(l)} + delta^{(l + 1)}(a^{(l)})^T
  ]
6. gradient
  [
  frac{partial}{partialTheta^{(l)}_{i,j}}J(Theta) = D^{(l)}_{i,j} = frac{1}{m}Delta^{(l)}_{i,j} +
  egin{cases} frac{lambda}{m}Theta^{(l)}_{i, j}, & ext {if j $geq$ 1} \ 0, & ext{if j = 0} end{cases}
  ]
Gradient Checking
1. [
  frac{d}{dTheta}J(Theta) approx frac{J(Theta + epsilon) - J(Theta - epsilon)}{2epsilon}
  ]
2. A small value for (epsilon) such as (epsilon = 10^{-4})
3. check that gradApprox (approx) deltalVector
4.
```
epsilon = 1e-4;
for i = 1 : n
    thetaPlus = theta;
    thetaPlus(i) += epsilon;
    thetaMinus = theta;
    thetaMinus(i) -= epsilon;
    gradApprox(i) = (J(thetaPlus) - J(thetaMinus)) / (2 * epsilon);
end;
```
Rolling and Unrolling

Random Initialization
```
Theta = rand(n, m)) * (2 * INIT_EPSILON) - INIT_EPSILON;
```
1. initialize ( Theta^{(l)}_{ij} in [-epsilon, epsilon] )
2. else if we initializing all theta weights to zero, all nodes will update to the same value repeatedly when we back_propagate.
3. One effective strategy for choosing (epsilon_{init}) is to base the number of units in the network. A good choice of (epsilon_{init}) is (epsilon_{init} = frac{sqrt{6}}{sqrt{L_{in} + L_{out}}} )
Training a Neural Network
1. Randomly initialize weights
```
    Theta = rand(n, m) * (2 * epsilon) - epsilon;
```
1. Implement forward propagation to get (h_Theta(x^{(i)})) for any (x^{(i)})
2. Implement code to compute cost function (J(Theta))
3. Implement back-prop to compute partial derivatives ( frac{d(JTheta)}{dTheta_{jk}^{(l)}} )
  
  ( g’(z) = frac{d}{dz}g(z) = g(z)(1 - g(z)))
  
  ( sigmoid(z) = g(z) = frac{1}{1 + e^{-z}})
4. Use gradient checking to compare ( frac{d(JTheta)}{dTheta_{jk}^{(l)}} ) computed using back-propagation vs. using numerical estimate of gradient of (J(Theta))
  Then disable gradient checking code
5. Use gradient descent or advanced optimization method with back-propagation to try to minimize (J(Theta)) as a function of parameters (Theta)
查看全文

相关阅读:
朴素贝叶斯分类模型
 进程同步机制
 Django知识点汇总
 Django框架_URLconf、Views、template、ORM
WebSocket介绍
 Django小练习
 数据库(11)-- Hash索引和BTree索引的区别
 MySQL数据库--练习
 Web 框架 Flask
Django之Model操作

原文地址：https://www.cnblogs.com/QQ-1615160629/p/10963940.html

05 Neural Networks

Neural Networks

The ‘one learning algorithm’ hypothesis

Model Representation

Define

Calculate

Cost Function

Back-propagation Algorithm

Algorithm

Gradient Checking

Rolling and Unrolling

Random Initialization

Training a Neural Network