A review of gradient descent optimization methods

zoukankan html css js c++ java

A review of gradient descent optimization methods
Suppose we are going to optimize a parameterized function (J( heta)), where ( heta in mathbb{R}^d), for example, ( heta) could be a neural net.

More specifically, we want to (mbox{ minimize } J( heta; mathcal{D})) on dataset (mathcal{D}), where each point in (mathcal{D}) is a pair ((x_i, y_i)).

There are different ways to apply gradient descent.

Let (eta) be the learning rate.
1. Vanilla batch update
  ( heta gets heta - eta abla J( heta; mathcal{D}))
  Note that ( abla J( heta; mathcal{D})) computes the gradient on of the whole dataset (mathcal{D}).
```
    for i in range(n_epochs): 
        gradient = compute_gradient(J, theta, D)
        theta = theta - eta * gradient
        eta = eta * 0.95
```
It is obvious that when (mathcal{D}) is too large, this approach is unfeasible.
1. Stochastic Gradient Descent
  Stochastic Gradient, on the other hand, update the parameters example by example.
  ( heta gets heta - eta *J( heta, x_i, y_i)), where ((x_i, y_i) in mathcal{D}).
```
    for n in range(n_epochs):
        for x_i, y_i in D: 
            gradient=compute_gradient(J, theta, x_i, y_i)
            theta = theta - eta * gradient 
        eta = eta * 0.95 
```
1. Mini-batch Stochastic Gradient Descent
  Update ( heta) example by example could lead to high variance, the alternative approach is to update ( heta) by mini-batches (M) where (|M| ll |mathcal{D}|).
```
    for n in range(n_epochs):
        for M in D: 
            gradient = compute_gradient(J, M)
            theta = theta - eta * gradient 
        eta = eta * 0.95
```
Question? Why decaying the learning rate leads to convergence?
why (sum_{i=1}^{infty} eta_i = infty) and (sum_{i=1}^{infty} eta_i^2 < infty) is the condition for convergence? Based on what assumption of (J( heta))?
查看全文

相关阅读:
pagic Deno + React 驱动的静态网站生成器入门
 antd Form.Item label添加解释信息
 deno可以通过url引入标准库，运行时自动下载，下载到哪里呢
 如何查看github开源项目star趋势
 使用deno开发post请求，get请求，监测文件变化自动重启（类似于nodemon）
windows安装deno
react 轮播图 react-slick
display:flex 元素垂直居中，有间距，右对齐
 Java对map键名进行顺序排序后转为字符串
 IDEA快捷键大全

原文地址：https://www.cnblogs.com/gaoqichao/p/9153675.html