zoukankan      html  css  js  c++  java
  • This instability is a fundamental problem for gradient-based learning in deep neural networks. vanishing exploding gradient problem

    The unstable gradient problem: The fundamental problem here isn't so much the vanishing gradient problem or the exploding gradient problem. It's that the gradient in early layers is the product of terms from all the later layers. When there are many layers, that's an intrinsically unstable situation. The only way all layers can learn at close to the same speed is if all those products of terms come close to balancing out. Without some mechanism or underlying reason for that balancing to occur, it's highly unlikely to happen simply by chance. In short, the real problem here is that neural networks suffer from an unstable gradient problem. As a result, if we use standard gradient-based learning techniques, different layers in the network will tend to learn at wildly different speeds.

    Again, early hidden layers learn much more slowly than later hidden layers. In this case, the first hidden layer is learning roughly 100 times slower than the final hidden layer. No wonder we were having trouble training these networks earlier!

    We have here an important observation: in at least some deep neural networks, the gradient tends to get smaller as we move backward through the hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers. And while we've seen this in just a single network, there are fundamental reasons why this happens in many neural networks. The phenomenon is known as the vanishing gradient problem**See Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, by Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber (2001). This paper studied recurrent neural nets, but the essential phenomenon is the same as in the feedforward networks we are studying. See also Sepp Hochreiter's earlier Diploma Thesis,Untersuchungen zu dynamischen neuronalen Netzen (1991, in German)..

    Why does the vanishing gradient problem occur? Are there ways we can avoid it? And how should we deal with it in training deep neural networks? In fact, we'll learn shortly that it's not inevitable, although the alternative is not very attractive, either: sometimes the gradient gets much larger in earlier layers! This is the exploding gradient problem, and it's not much better news than the vanishing gradient problem. More generally, it turns out that the gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers. This instability is a fundamental problem for gradient-based learning in deep neural networks. It's something we need to understand, and, if possible, take steps to address.

    One response to vanishing (or unstable) gradients is to wonder if they're really such a problem. Momentarily stepping away from neural nets, imagine we were trying to numerically minimize a function f(x)f(x) of a single variable. Wouldn't it be good news if the derivative f(x)f′(x) was small? Wouldn't that mean we were already near an extremum? In a similar way, might the small gradient in early layers of a deep network mean that we don't need to do much adjustment of the weights and biases?

    http://neuralnetworksanddeeplearning.com/chap5.html

    Design a Gradient Logo Illustrator Tutorial - YouTube

  • 相关阅读:
    Auto Generate Reflection Information for C++
    面向嵌入式的JavaScript引擎
    如何在unity3d的UGUI中给Text添加材质和艺术字
    获取Unity GLES版本
    PBR原理
    Unity双开
    Unity Shader各种效果
    Redis是什么
    其他地方拷贝过来的VS项目一直报错不能启动IIS
    IIS部署PHP项目并与mysql完美结合
  • 原文地址:https://www.cnblogs.com/rsapaper/p/7527566.html
Copyright © 2011-2022 走看看