Deep Learning 5: Gradient Descent - 走看看

zoukankan html css js c++ java

Deep Learning 5: Gradient Descent

Gradients add up at forks. If a variable branches out to different parts of the circuit, then the gradients that flow back to it will add.

Example:

There are 3 paths from x to f, and 2 paths from y to f.

As for the fork of x, it braches out 3 paths and converges at f, when computing df/dx, the 3 gradients flowing back should be sumed up.

As for the fork of y, it braches out 2 paths and converges at f, when computing df/dy, the 2 gradients flowing back should be sumed up.

forward pass:

backward pass:

The size of the mini-batch is a hyperparameter but it is not very common to cross-validate it. It is usually based on memory constraints (if any), or set to some value, e.g. 32, 64 or 128. We use powers of 2 in practice because many vectorized operation implementations work faster when their inputs are sized in powers of 2.

Reference:

1.http://cs231n.github.io/optimization-2/

2.https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

查看全文

相关阅读:
Arduino
DTU
现代信号处理与应用
 matlab学习记录
 列车准点节能操纵
 泊松过程
 序号生成算法odoo
操作系统特性
 c语言中的变量
 xml中的四则运算与时间爱格式

原文地址：https://www.cnblogs.com/wordchao/p/9233247.html

Copyright © 2011-2022 走看看