zoukankan      html  css  js  c++  java
  • Deep Learning 5: Gradient Descent

    Gradients add up at forks. If a variable branches out to different parts of the circuit, then the gradients that flow back to it will add.

    Example:

    There are 3 paths from x to f, and 2 paths from y to f.

    As for the fork of x, it braches out 3 paths and converges at f, when computing df/dx, the 3 gradients flowing back should be sumed up.

    As for the fork of y, it braches out 2 paths and converges at f, when computing df/dy, the 2 gradients flowing back should be sumed up.

    forward pass:

    backward pass:

    The size of the mini-batch is a hyperparameter but it is not very common to cross-validate it. It is usually based on memory constraints (if any), or set to some value, e.g. 32, 64 or 128. We use powers of 2 in practice because many vectorized operation implementations work faster when their inputs are sized in powers of 2.

    Reference:

    1.http://cs231n.github.io/optimization-2/

    2.https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

  • 相关阅读:
    Django-url反向解析和命名空间
    django-分页paginator
    python-命令模式
    python-访问者模式
    python-责任链模式
    python-备忘录模式
    最长无重复字串
    计算机网络常见面试题
    C++对象模型
    原码反码补码(转)
  • 原文地址:https://www.cnblogs.com/wordchao/p/9233247.html
Copyright © 2011-2022 走看看