zoukankan      html  css  js  c++  java
  • LMS algorithm

    梯度下降求参数

    LMS stands for “least mean squares”

    image

    image

    image

    batch gradient descent

    image

    stochastic gradient descent

    image

    直接求参数:

    image

    image

    image

    image

    image

    概率解释

    image

    e(i) are distributed IID (independently and identically distributed) according to a Gaussian distribution (also called a Normal distribution).

    image

    image

    image

    likelihood function

    image

    image

    image

    To summarize: Under the previous probabilistic assumptions on the data,least-squares regression corresponds to finding the maximum likelihood estimate of θ. This is thus one set of assumptions under which least-squares regression can be justified as a very natural method that’s just doing maximum likelihood estimation. (Note however that the probabilistic assumptions are by no means necessary for least-squares to be a perfectly good and rational procedure, and there may—and indeed there are—other natural assumptions that can also be used to justify it.)
    Note also that, in our previous discussion, our final choice of θ did notdepend on what was σ^2, and indeed we’d have arrived at the same result even if σ^2 were unknown. We will use this fact again later, when we talk about the exponential family and generalized linear models.

    Locally weighted linear regression

    image

    image

    image

    Locally weighted linear regression is the first example we’re seeing of a non-parametric algorithm. The (unweighted) linear regression algorithm that we saw earlier is known as a parametric learning algorithm, because it has a fixed, finite number of parameters, which are fit to the data. Once we’ve fit the θ and stored them away, we no longer need to keep the training data around to make future predictions. In contrast, to make predictions using locally weighted linear regression, we need to keep the entire training set around. The term “non-parametric” (roughly) refers to the fact that the amount of stuff we need to keep in order to represent the hypothesis h grows linearly with the size of the training set.

  • 相关阅读:
    2008年秋季毕业设计总体安排
    2008秋季计算机软件基础0903课堂用例(1)
    收藏:微软新技术不断,开发者如何面对?
    2008秋季计算机软件基础0901课堂用例
    2008秋季计算机软件基础0908课堂用例(1)
    WebBrows仿造Cookie
    ScriptCase价格调整通知
    JavaMail API简介
    Spring攻略学习笔记(3.05)重用切入点定义
    verletjs:超酷的开源JavaScript物理引擎
  • 原文地址:https://www.cnblogs.com/gghost/p/3286755.html
Copyright © 2011-2022 走看看