1、Slowing Down the Weight Norm Increase in Momentum-based Optimizer
地址:https://arxiv.org/pdf/2006.08217.pdf
github:https://github.com/clovaai/AdamP.
2、OD-SGD: ONE-STEP DELAY STOCHASTIC GRADIENT DESCENT FOR DISTRIBUTED TRAINING
地址:https://arxiv.org/pdf/2005.06728.pdf