zoukankan      html  css  js  c++  java
  • Taylor Series and Gradient Descent

    Taylor Approximation

    According to Taylor Series:

    It defines a way to estimate the value of function f, when the variable x has a small change Δx. If we only use the first two items on the right side of the equation, it is called the first order approximation.

    Second-order approximation:

    Gradient Descent

    When we try to optimize a machine learning problem, actually the target function is the Loss Function L, and at each step if optimization we decrease it by changing the variable θ a little bit. If we put it into the first-order approximation in multivariable case:

    Then our expectation is to decrease the Loss by each step of update, so the problem becomes choosing a proper Δθ to guarantee Loss becoming smaller. The way that Gradient Descent used is to take advantage of the Norm. Because a vector's norm is non-negative, we assign Δθ to be the product of a small negtive number α and the norm of the gradient. So

    This is the common equation to update θ in Machine Learning course, in which α is the learning rate.

  • 相关阅读:
    HTML--1标签表格
    HTML--4格式布局
    HTML--3css样式表
    快速制作网页的方法
    表单
    表单练习——邮箱注册
    斐波那契数列
    0125 多线程 继承Thread 练习
    Hash(哈希)
    [COI2007] [luogu P1823] Patrik 音乐会的等待 解题报告 (单调栈)
  • 原文地址:https://www.cnblogs.com/rhyswang/p/12199445.html
Copyright © 2011-2022 走看看