1.Learning with gradient descent(使用梯度下降法进行学习)
cost function(代价函数)
训练神经网络的目的是找到能最小化二次代价函数C(w,b)的权重和偏置。梯度下降:
when we move the ball a small amount Δv1 in the v1 direction, and a small amount Δv2 in the v2 direction. Calculus tells us that C changes as follows:
defined the gradient of C to be the vector of partial derivatives:
the expression for ΔC can be rewritten as:
choose Δv as:
then:
this guarantees that ΔC≤0, C will always decrease, never increase.
hat is, we'll use Equation (10) to compute a value for Δv, then move the ball's position v by that amount:
Then we'll use this update rule again, to make another move. If we keep doing this, over and over, we'll keep decreasing C until - we hope - we reach a global minimum.
stochastic gradient descent(随机梯度下降):
works by picking out a randomly chosen mini-batch of training inputs, and training with those:
随机记梯度下降通过随机的选取并训练输入的小批量数据来工作,其中两个求和符号是在当前小批量数据中的所有训练样本Xj,上进行的。
然后我们再挑选另一随机选定的小批量数据去训练。直到我们用完了所有的训练输入,这被称为完成了一个训练迭代期(epoch)。然后我们就会开始一个新的训练迭代期。
备:我们不能提前知道训练数据量的情况下,舍弃1/n和1/m是有效的。
小批量数据的大小设置为1时,成为online学习,或递增学习