zoukankan      html  css  js  c++  java
  • Caffe的Solver参数设置

    Caffe的solver参数设置

    http://caffe.berkeleyvision.org/tutorial/solver.html
    solver是通过协调前向-反向传播的参数更新来控制参数优化的。一个模型的学习是通过Solver来监督优化和参数更新,以及通过Net来产生loss和梯度完成的。
    Caffe提供的优化方法有:

    • Stochastic Gradient Descent (type: “SGD”),
    • AdaDelta (type: “AdaDelta”),
    • Adaptive Gradient (type: “AdaGrad”),
    • Adam (type: “Adam”),
    • Nesterov’s Accelerated Gradient (type: “Nesterov”),
    • RMSprop (type: “RMSProp”)

    The solver

    scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
    iteratively optimizes by calling forward / backward and updating parameters
    (periodically) evaluates the test networks
    snapshots the model and solver state throughout the optimization
    where each iteration

    calls network forward to compute the output and loss
    calls network backward to compute the gradients
    incorporates the gradients into parameter updates according to the solver method
    updates the solver state according to learning rate, history, and method
    to take the weights all the way from initialization to learned model.

    Like Caffe models, Caffe solvers run in CPU / GPU modes.

    SGD

    Stochastic gradient descent (type: “SGD”) updates the weights W by a linear combination of the negative gradient ∇L(W) and the previous weight update Vt. The learning rate α is the weight of the negative gradient. The momentum μ is the weight of the previous update.

    Formally, we have the following formulas to compute the update value Vt+1 and the updated weights Wt+1 at iteration t+1, given the previous weight update Vt and current weights Wt:

    Vt+1=μVt−α∇L(Wt)
    Wt+1=Wt+Vt+1
    The learning “hyperparameters” (α and μ) might require a bit of tuning for best results. If you’re not sure where to start, take a look at the “Rules of thumb” below, and for further information you might refer to Leon Bottou’s Stochastic Gradient Descent Tricks [1].

    [1] L. Bottou. Stochastic Gradient Descent Tricks. Neural Networks: Tricks of the Trade: Springer, 2012.

    总结solver文件个参数的意义

    iteration: 数据进行一次前向-后向的训练
    batchsize:每次迭代训练图片的数量
    epoch:1个epoch就是将所有的训练图像全部通过网络训练一次
    例如:假如有1280000张图片,batchsize=256,则1个epoch需要1280000/256=5000次iteration
    它的max-iteration=450000,则共有450000/5000=90个epoch
    而lr什么时候衰减与stepsize有关,减少多少与gamma有关,即:若stepsize=500, base_lr=0.01, gamma=0.1,则当迭代到第一个500次时,lr第一次衰减,衰减后的lr=lr*gamma=0.01*0.1=0.001,以后重复该过程,所以
    stepsize是lr的衰减步长,gamma是lr的衰减系数。
    在训练过程中,每到一定的迭代次数都会测试,迭代次数是由test-interval决定的,如test_interval=1000,则训练集每迭代1000次测试一遍网络,而
    test_size, test_iter, 和test图片的数量决定了怎样test, test-size决定了test时每次迭代输入图片的数量,test_iter就是test所有的图片的迭代次数,如:500张test图片,test_iter=100,则test_size=5, 而solver文档里只需要根据test图片总数量来设置test_iter,以及根据需要设置test_interval即可。

  • 相关阅读:
    1044 拦截导弹
    3060 抓住那头奶牛 USACO
    2727:仙岛求药(广搜)
    4906 删数问题(另一种贪心思路)
    1004 四子连棋
    1005 生日礼物
    1031 质数环
    1008 选数
    1073 家族
    2801 LOL-盖伦的蹲草计划
  • 原文地址:https://www.cnblogs.com/CarryPotMan/p/5343692.html
Copyright © 2011-2022 走看看