zoukankan      html  css  js  c++  java
  • A review of gradient descent optimization methods

    Suppose we are going to optimize a parameterized function (J( heta)), where ( heta in mathbb{R}^d), for example, ( heta) could be a neural net.

    More specifically, we want to (mbox{ minimize } J( heta; mathcal{D})) on dataset (mathcal{D}), where each point in (mathcal{D}) is a pair ((x_i, y_i)).

    There are different ways to apply gradient descent.

    Let (eta) be the learning rate.

    1. Vanilla batch update
      ( heta gets heta - eta abla J( heta; mathcal{D}))
      Note that ( abla J( heta; mathcal{D})) computes the gradient on of the whole dataset (mathcal{D}).
        for i in range(n_epochs): 
            gradient = compute_gradient(J, theta, D)
            theta = theta - eta * gradient
            eta = eta * 0.95
    

    It is obvious that when (mathcal{D}) is too large, this approach is unfeasible.

    1. Stochastic Gradient Descent
      Stochastic Gradient, on the other hand, update the parameters example by example.
      ( heta gets heta - eta *J( heta, x_i, y_i)), where ((x_i, y_i) in mathcal{D}).
        for n in range(n_epochs):
            for x_i, y_i in D: 
                gradient=compute_gradient(J, theta, x_i, y_i)
                theta = theta - eta * gradient 
            eta = eta * 0.95 
    
    1. Mini-batch Stochastic Gradient Descent
      Update ( heta) example by example could lead to high variance, the alternative approach is to update ( heta) by mini-batches (M) where (|M| ll |mathcal{D}|).
        for n in range(n_epochs):
            for M in D: 
                gradient = compute_gradient(J, M)
                theta = theta - eta * gradient 
            eta = eta * 0.95
    

    Question? Why decaying the learning rate leads to convergence?
    why (sum_{i=1}^{infty} eta_i = infty) and (sum_{i=1}^{infty} eta_i^2 < infty) is the condition for convergence? Based on what assumption of (J( heta))?

  • 相关阅读:
    智能资产构建去中心化的资产管理系统
    以太坊—开放的智能合约完整解决方案
    去中心化的域名系统Namecoin
    bootstrap-内联文本元素-小号文本
    MySQL重新初始化安装数据库
    MySQL重新初始化安装数据库
    MySQL重新初始化安装数据库
    MySQL重新初始化安装数据库
    PHP模拟登录并获取数据
    PHP模拟登录并获取数据
  • 原文地址:https://www.cnblogs.com/gaoqichao/p/9153675.html
Copyright © 2011-2022 走看看