mini batch
, Batch gradient descent
,stochastic gradient descent
为了克服两种方法的缺点,现在一般采用的是一种 折中手段,mini-batch gradient decent
epochs vs iteration vs batch size
是 整个数据集(所有训练样本) 只通过神经网络向前和向后传递一次。 -
batch size
是单个批次中出现的训练示例的总数。Note: Batch size (batch大小 — 一个batch有多少样例) and number of batches (batch数量 — 有几个batch) are two different things.
是完成一个历元所需的批数。Note: The number of batches is equal to number of iterations for one epoch.
有几个 batch 就需要几次 iteration 来完成一次 epoch
Why we use more than one Epoch?
I know it doesn’t make sense in the starting that — passing the entire dataset through a neural network is not enough. And we need to pass the full dataset multiple times to the same neural network. But keep in mind that we are using a limited dataset and to optimise the learning and the graph we are using Gradient Descent which is an iterative process. So, updating the weights with single pass or one epoch is not enough.
One epoch leads to underfitting of the curve in the graph (below).
As the number of epochs increases, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve.