backpropagation算法示例
下面举个例子,假设在某个mini-batch的有样本X和标签Y,其中(Xin R^{m imes 2}, Yin R^{m imes 1}),现在有个两层的网络,对应的计算如下:
[egin{split}
i_1 &= XW_1+ b_1\
o_1 &= sigmoid(i_1)\
i_2 &= o_1W_2 + b_2\
o_2 &= sigmoid(i_2)
end{split}
]
其中(W_1 in R^{2 imes 3}, b_1in R^{1 imes 3}, W_2in R^{3 imes 1}, b_2in R^{1 imes 1})都是参数,然后使用平方损失函数
[cost = dfrac{1}{2m}sum_i^m(o_{2i} - Y_i)^2
]
下面给出反向传播的过程
[egin{split}
dfrac{partial cost}{partial o_2} &= dfrac{1}{m}(o_2 - Y)\
dfrac{partial o_2}{partial i_2} &= sigmoid(i_2)odot (1 - sigmoid(i_2)) = o_2 odot (1 - o_2)\
dfrac{partial i_2}{partial W_2} &= o_1\
dfrac{partial i_2}{partial o_2} &= w_2\
dfrac{partial i_2}{partial b_2} &= 1\
dfrac{partial o_1}{partial i_1} &= sigmoid(i_1)odot (1 - sigmoid(i_1)) = o_1odot (1 - o_1)\
dfrac{partial i_1}{partial W_1} &= X\
dfrac{partial i_1}{partial b_1} &= 1
end{split}
]
所以有
[egin{split}
Delta W_2 &= dfrac{partial cost}{partial o_2}dfrac{partial o_2}{partial i_2}dfrac{partial i_2}{partial W_2}\
Delta b_2 &= dfrac{partial cost}{partial o_2}dfrac{partial o_2}{partial i_2}dfrac{partial i_2}{partial b_2}\
Delta W_1 &= dfrac{partial cost}{partial o_2}dfrac{partial o_2}{partial i_2}dfrac{partial i_2}{partial o_1}dfrac{partial o_1}{partial i_1}dfrac{partial i_1}{partial W_1}\
Delta b_1 &= dfrac{partial cost}{partial o_2}dfrac{partial o_2}{partial i_2}dfrac{partial i_2}{partial o_1}dfrac{partial o_1}{partial i_1}dfrac{partial i_1}{partial b_1}
end{split}
]
根据上述的链式法则,有
[egin{split}
Delta W_2 &= left((dfrac{1}{m}(o_2 - Y)odot(o_2odot (1 - o_2)))^T imes o_1
ight)^T\
Delta W_1 &= left((((dfrac{1}{m}(o_2 - Y)odot (o_2odot (1 - o_2))) imes W_2^T)odot o_1odot (1 - o_1))^T imes X
ight)^T
end{split}
]