caffe调loss方法

zoukankan html css js c++ java

caffe调loss方法
正文

what should I do if...
...my loss diverges? (increases by order of magnitude, goes to inf. or NaN)
lower the learning rate
raise momentum (with corresponding learning rate drop)
raise weight decay
raise batch size
use gradient clipping (limit the L2 norm of the gradient to a particular value at each iteration; shrink it to that norm if greater)
try another solver: momentum SGD, ADAM, RMSProp, ...
try a smaller initialization (e.g., for a Gaussian init., lower the stdev.)

what should I do if...
...my loss doesn’t improve / gets stuck / drops slowly?
- raise the learning rate
- (maybe) lower momentum, weight decay, and/or batch size
- try another solver: momentum SGD, ADAM, RMSProp, ...
- transfer a pre-trained (e.g. on ImageNet) initialization, if possible
- use a larger initialization (in particular, make sure you didn’t zero-initialize any multiplicative weights in intermediate layers)
- use a “smarter” initialization (e.g., for linear layers followed by ReLUs, try the msra initialization in Caffe)
- remove some layers to make the network shallower
  at least to start!
  a strategy for model design: begin with a simple, trainable network; “deepen” it by adding new layers one-by-one
-modify the architecture to improve gradient flow:
batch normalization
residual learning [ResNet]
intermediate losses [GoogLeNet]
other tricks

be patient! (go outside?)
deep learning can take a long time
training AlexNet in 2012: 12 days
although this is down to 1 day in 2015!
loss hovers around the chance value of ln(1000) ≅ 6.908 for the first 1000+ iterations (~1 hour on 2012 GPU)
training ResNet-152 in 2015: 1-2 months (on 8 GPUs!)
the best configurations (net architectures, solvers) at convergence are often not the ones that train fastest early on
some tricks to speed up learning can be “greedy” rather than ultimately beneficial

补充一个：如果显存不够，考虑设定iter_size来增大batch_size

reference

https://docs.google.com/presentation/d/1HxGdeq8MPktHaPb-rlmYYQ723iWzq9ur6Gjo71YiG0Y/edit#slide=id.g8629ab2c8_0_60
查看全文

相关阅读:
再见 2020，愿“山河无恙，人间皆安”| 年终总结
 Oracle
Linux安装
 线程池
 AutoJS
VSCode
c++ 解析yaml文件
 管道：哪些命令能直接从管道的输出中读取？
K8S 集群部署
 Android项目实战（六十一）：pdf文件用图片方式预览

原文地址：https://www.cnblogs.com/zjutzz/p/6858776.html

caffe调loss方法

正文

reference