zoukankan      html  css  js  c++  java
  • caffe调loss方法

    正文

    what should I do if...
    ...my loss diverges? (increases by order of magnitude, goes to inf. or NaN)
    lower the learning rate
    raise momentum (with corresponding learning rate drop)
    raise weight decay
    raise batch size
    use gradient clipping (limit the L2 norm of the gradient to a particular value at each iteration; shrink it to that norm if greater)
    try another solver: momentum SGD, ADAM, RMSProp, ...
    try a smaller initialization (e.g., for a Gaussian init., lower the stdev.)

    what should I do if...
    ...my loss doesn’t improve / gets stuck / drops slowly?

    • raise the learning rate

    • (maybe) lower momentum, weight decay, and/or batch size

    • try another solver: momentum SGD, ADAM, RMSProp, ...

    • transfer a pre-trained (e.g. on ImageNet) initialization, if possible

    • use a larger initialization (in particular, make sure you didn’t zero-initialize any multiplicative weights in intermediate layers)

    • use a “smarter” initialization (e.g., for linear layers followed by ReLUs, try the msra initialization in Caffe)

    • remove some layers to make the network shallower
      at least to start!
      a strategy for model design: begin with a simple, trainable network; “deepen” it by adding new layers one-by-one

    -modify the architecture to improve gradient flow:
    batch normalization
    residual learning [ResNet]
    intermediate losses [GoogLeNet]
    other tricks

    be patient! (go outside?)
    deep learning can take a long time
    training AlexNet in 2012: 12 days
    although this is down to 1 day in 2015!
    loss hovers around the chance value of ln(1000) ≅ 6.908 for the first 1000+ iterations (~1 hour on 2012 GPU)
    training ResNet-152 in 2015: 1-2 months (on 8 GPUs!)
    the best configurations (net architectures, solvers) at convergence are often not the ones that train fastest early on
    some tricks to speed up learning can be “greedy” rather than ultimately beneficial

    补充一个:如果显存不够,考虑设定iter_size来增大batch_size

    reference

    https://docs.google.com/presentation/d/1HxGdeq8MPktHaPb-rlmYYQ723iWzq9ur6Gjo71YiG0Y/edit#slide=id.g8629ab2c8_0_60

  • 相关阅读:
    异常日志以及非异常日志记录方法
    oracle 监测数据库是否存在指定字段
    listview禁止双击一条之后选中复选框按钮的方法
    oracle 的rowid和rownum
    修改文件的名字的写法
    使用C#读取XML节点,修改XML节点
    BZOJ 1004: [HNOI2008]Cards
    P5022 旅行 (NOIP2018)
    P5021 赛道修建 (NOIP2018)
    P5020 货币系统 (NOIP2018)
  • 原文地址:https://www.cnblogs.com/zjutzz/p/6858776.html
Copyright © 2011-2022 走看看