Deduction
全连接结构中的符号定义如下图:
Forward Propagation
Backward Propagation
Follow Chain Rule, define loss function , so we have:
计算
Now we firstly get output layer . As an example, we take cross entropy as loss function, with SoftMax as output function.
继续由 推有:
So
计算
的求解比较简单。
由于:
Caffe Practice
Forward Propagation
bottom节点数, top节点数, batch size。则bottom矩阵为,top矩阵为,weight 矩阵, bias为, bias weight为。下图给出了这几个关键量在Caffe中的存在形式:
数学形式为:
Backward Propagation
后向还是分两部分算,一部分是计算; 一部分是计算bottom_diff = ,以作为下一层的top_diff, 这里实际上就是, 因此bottom_diff = 。下图给出Caffe计算后向传播时的几个关键量。
计算
计算bottom_diff
References
[1] http://www.jianshu.com/p/c69cd43c537a
[2] http://blog.csdn.net/walilk/article/details/50278697
[3] http://blog.sina.com.cn/s/blog_88d97475010164yn.html
[4] http://www.itnose.net/detail/6177501.html
[5] http://www.zhihu.com/question/38102762