zoukankan      html  css  js  c++  java
  • Deep Learning 5: Gradient Descent

    Gradients add up at forks. If a variable branches out to different parts of the circuit, then the gradients that flow back to it will add.

    Example:

    There are 3 paths from x to f, and 2 paths from y to f.

    As for the fork of x, it braches out 3 paths and converges at f, when computing df/dx, the 3 gradients flowing back should be sumed up.

    As for the fork of y, it braches out 2 paths and converges at f, when computing df/dy, the 2 gradients flowing back should be sumed up.

    forward pass:

    backward pass:

    The size of the mini-batch is a hyperparameter but it is not very common to cross-validate it. It is usually based on memory constraints (if any), or set to some value, e.g. 32, 64 or 128. We use powers of 2 in practice because many vectorized operation implementations work faster when their inputs are sized in powers of 2.

    Reference:

    1.http://cs231n.github.io/optimization-2/

    2.https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

  • 相关阅读:
    Arduino
    DTU
    现代信号处理与应用
    matlab学习记录
    列车准点节能操纵
    泊松过程
    序号生成算法odoo
    操作系统特性
    c语言中的变量
    xml中的四则运算与时间爱格式
  • 原文地址:https://www.cnblogs.com/wordchao/p/9233247.html
Copyright © 2011-2022 走看看