Symbols:
- m = Number of training examples
- x’s = “input” variable /features
- y’s = “output” variable / “target” variaable
- (x, y) = one training example
- ((x^{(i)}, y^{(i)})) = (i_{th}) training example
- h(x) = hypothesis function
- (h_ heta(x) = heta_0 + heta_1x) shorthand:h(x)
Cost Function
- squared cost function
[J( heta_0, heta_1) = dfrac {1}{2m} displaystyle sum _{i=1}^m left ( hat{y}_{i}- y_{i} ight)^2 = dfrac {1}{2m} displaystyle sum _{i=1}^m left (h_ heta (x_{i}) - y_{i} ight)^2] - Goal: (minimize_{ heta_0, heta_1}J( heta_0, heta_1))
Gradient descent
repeat until convergence {
( heta_j := heta_j - alpha frac{partial}{partial heta_j} J( heta_0, heta_1)) (for j = 0 and j = 0)
}
需要同时更新( heta_j), 否则先更新( heta_i)会对后面的项的更新产生影响
(egin{align*} ext{repeat until convergence: } lbrace & ewline heta_0 := & heta_0 - alpha frac{1}{m} sumlimits_{i=1}^{m}(h_ heta(x_{i}) - y_{i}) ewline heta_1 := & heta_1 - alpha frac{1}{m} sumlimits_{i=1}^{m}left((h_ heta(x_{i}) - y_{i}) x_{i} ight) ewline brace& end{align*})