zoukankan      html  css  js  c++  java
  • What is the difference between iterations and epochs in Convolution neural networks?

    https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

    It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize. 

     https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks/31842945

    In the neural network terminology:

    • one epoch = one forward pass and one backward pass of all the training examples
    • batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
    • number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

    Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

    http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

    Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples. 

    Overview

    Batch methods, such as limited memory BFGS, which use the full training set to compute the next update to parameters at each iteration tend to converge very well to local optima. They are also straight forward to get working provided a good off the shelf implementation (e.g. minFunc) because they have very few hyper-parameters to tune. However, often in practice computing the cost and gradient for the entire training set can be very slow and sometimes intractable on a single machine if the dataset is too big to fit in main memory. Another issue with batch optimization methods is that they don’t give an easy way to incorporate new data in an ‘online’ setting. Stochastic Gradient Descent (SGD) addresses both of these issues by following the negative gradient of the objective after seeing only a single or a few training examples. The use of SGD In the neural network setting is motivated by the high cost of running back propagation over the full training set. SGD can overcome this cost and still lead to fast convergence.

    Stochastic Gradient Descent

    The standard gradient descent algorithm updates the parameters θθ of the objective J(θ)J(θ) as,

    θ=θαθE[J(θ)]θ=θ−α∇θE[J(θ)]

    where the expectation in the above equation is approximated by evaluating the cost and gradient over the full training set. Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples. The new update is given by,

    θ=θαθJ(θ;x(i),y(i))θ=θ−α∇θJ(θ;x(i),y(i))

    with a pair (x(i),y(i))(x(i),y(i)) from the training set.

  • 相关阅读:
    因为付出,所以喜欢。开发就是这么坑!
    停滞在一个圈子,决定人生的高低![深度文章]
    我不曾忘记的初心-程序员如何看待买房子
    能力要进化-还在技术停滞不前吃老本吗?
    能力要进化-还在技术停滞不前吃老本吗?
    我不曾忘记初心-我们最终都成了自己讨厌的人
    我不曾忘记初心-我们最终都成了自己讨厌的人
    我不曾忘记的初心-冒险努力正是你缺少的!
    我不曾忘记的初心-冒险努力正是你缺少的!
    JS之正则表达式
  • 原文地址:https://www.cnblogs.com/rsapaper/p/7600987.html
Copyright © 2011-2022 走看看