The Backpropagation Algorithm

zoukankan html css js c++ java

The Backpropagation Algorithm

https://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf

7.1 Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single layer of computing units. However the computational effort needed for finding the correct combination of weights increases substantially when more parameters and more complicated topologies are considered. In this chapter we discuss a popular learning method capable of handling such large learning problems — the backpropagation algorithm. This numerical method was used by different research communities in different contexts, was discovered and rediscovered, until in 1985 it found its way into connectionist AI mainly through the work of the PDP group [382]. It has been one of the most studied and used algorithms for neural networks learning ever since. In this chapter we present a proof of the backpropagation algorithm based on a graphical approach in which the algorithm reduces to a graph labeling problem. This method is not only more general than the usual analytical derivations, which handle only the case of special network topologies, but also much easier to follow. It also shows how the algorithm can be efficiently implemented in computing systems.

The optimization algorithm repeats a two phase cycle, propagation and weight update. When an input vector is presented to the network, it is propagated forward through the network, layer by layer, until it reaches the output layer. The output of the network is then compared to the desired output, using a loss function. The resulting error value is calculated for each of the neurons in the output layer. The error values are then propagated from the output back through the network, until each neuron has an associated error value that reflects its contribution to the original output. Backpropagation uses these error values to calculate the gradient of the loss function. In the second phas

e, this gradient is fed to the optimization method, which in turn uses it to update the weights, in an attempt to minimize the loss function.

查看全文

相关阅读:
数据库Tsql语句创建--约束--插入数据
 数据绑定的知识点<%%>,<%#%>,<%=%>
一般处理程序cookie和session+末尾的多选框，下拉框
 拼接字符串
 李航统计学习方法(第二版)（十四）：线性支持向量机与软间隔最大化
 李航统计学习方法(第二版)（十三）：线性可分支持向量机与硬间隔最大化
 python 并发专题（十二）：基础部分补充（四）协程
 python 并发专题（十一）：基础部分补充（三）线程
 python 并发专题（十）：基础部分补充（二）线程
 python 并发专题（九）：基础部分补充（一）进程

原文地址：https://www.cnblogs.com/rsapaper/p/6269463.html