之前的学习成果并不能解决复杂的非线性问题
Neural Networks
Sigmoid(logistic) activation function: activation function is another term for (g(z) = frac{1}{1+e^{-z}})
activation: the value that's computed by and as output by a specific
weights = parameters = ( heta)
input units: (x_1,x_2, x_3,dots, x_n)
bias unit/ bias neuron: (x_0) 与 (a_0^{(j)})
input units 和 hypothesis 之间的layer 由activation 构成
input wire/ output wire:input wire是指指向目标neuron的箭头,output wire是指从目标neuron指出的箭头
(a_i^{(j)}): "activation" of neuron (i) or of unit (i) in layer (j)
(Theta^{(j)}): matrix of weights controlling the function mapping form layer (j) to layer (j+1)
(注意(Theta)是大写的,因为它需要用到矩阵的形式了)
layer 1 == input layer
layer n == output layer (the last layer)
layer 2 ~ layer n-1 == hidden layer
for example:
直观点就是:
)generally, (Theta^{(j)}) will be of dimension (s_{j+1} imes (s_j+1)), if network has (s_j) units in layer (j) and (s_{j+1}) units in layer (j+1). ((s_j+1)中的(+1) comes from the addition in (Theta^{(j)}) of the "bias nodes," (x_0) and (Theta_0^{(j)}) . In other words the output nodes will not include the bias nodes while the inputs will. )
定义 (a^{(1)} = x)
(z^{j+1} = Theta^{(j)}a^{(j)})
(x_k^{(j+1)} = Theta_{k,0}^{(j)}a_0^{(j)} + Theta_{k,1}^{(j)}a_1^{(j)} + dots + Theta_{k,n^{(j)}}^{(j)}a_{n^{(j)}}^{(j)}quad ,(n^{(j)} ext{ means layer j has } n^{(j)} ext{ activation}))
(a^{(j)} = g(z^{(j)}) = g(Theta^{(j-1)}a^{(j-1)})quad(jge2))
设有 (n) 个 layers, then the last matrix (Theta^{(n)}) will have only one row which is multiplied by one column (a^{(j)}) so that our result is a single number:
(h_Theta(x) = a^{(n+1)}=g(z^{(n+1)}))
Add (a_0^{(j)}=1)
Forward Propagation:向前传播
Neural Networks 实际上是使用(a^{(n-1)})layer作为训练logistic regression的特征的,而非input layer,在(Theta^{(1)})中选择不同的参数可能得到一些复杂的特征,从而的到更好的hypothesis,这样做比直接用(x_1,x_2,dots ,x_n)作为训练特征更好
architecture(架构):the way that neural networks are connected
逻辑表达式对应的( heta):
- ({
m AND} = (x_1 igwedge x_2)):
- (Theta = egin{bmatrix}-30 &20& 20 end{bmatrix})
- ({
m NOR} = (lnot x_1 igwedge lnot x_2)):
- (Theta = egin{bmatrix}10 & -20& -20 end{bmatrix})
- ({
m OR} = (x_1 igvee x_2)):
- (Theta = egin{bmatrix}-10 &20& 20 end{bmatrix})
- ({
m NOT} = (lnot x)):
- (Theta = egin{bmatrix}-10 & 20end{bmatrix})
- ({
m XNOR} = (lnot x_1 igwedge lnot x_2) igvee ( x_1 igwedge x_2))
- 需要一个hidden layer: (a_1^{(2)} == (lnot x_1 igwedge lnot x_2),quad a_2^{(2)} == (x_1 igwedge x_2))
- output layer: (a^{(3)} == (a_1^{(2)} igvee a_2^{(2)}))
逻辑表达式的实现:
令(x=egin{bmatrix}1 \ x_1\x_2 end{bmatrix}), 则 (a_i = g(Theta_ix))就得到(Theta_i)对应的逻辑运算符运算(x_1,x_2)的结果了
比如 (Theta_i = egin{bmatrix}-10 &20& 20 end{bmatrix})那么(a_i == x_1 igvee x_2)
像({ m XNOR})这种复杂的逻辑表达式需要借助hidden layer才能算出来
对于 multiclass Classification:
用(y = egin{bmatrix}1\0\0\0 end{bmatrix}, egin{bmatrix}0\1\0\0 end{bmatrix}, egin{bmatrix}0\0\1\0 end{bmatrix}, egin{bmatrix}0\0\0\1 end{bmatrix},egin{bmatrix}0\0\0\0 end{bmatrix})来表示不同的class,