Regularization
for Linear Regression and Logistic Regression
Define
- under-fitting
欠拟合(high bias)
- over-fitting
过拟合 (high variance)
:have too many features, fail to generalize(泛化) to new examples.
Addressing over-fitting
- Reduce number of features.
- Manually select which features to keep.
- Model selection algorithm.
- Regularization
- Keep all the features. but reduce magnitude/values of parameters ( heta_j).
- Works well when we have a lot of features, each of whitch contributes a bit to predicting (y).
Regularized Cost Function
- [min_ heta dfrac{1}{2m} sum_{i=1}^m (h_ heta(x^{(i)}) - y^{(i)})^2 + lambda sum_{j=1}^n heta_j^2]
Regularized Linear Regression
-
Gradient Descent
[
egin{align*} & ext{Repeat} lbrace ewline & heta_0 := heta_0 - alpha frac{1}{m} sum_{i=1}^m (h_ heta(x^{(i)}) - y^{(i)})x_0^{(i)} ewline & heta_j := heta_j - alpha left[ left( frac{1}{m} sum_{i=1}^m (h_ heta(x^{(i)}) - y^{(i)})x_j^{(i)} ight) + frac{lambda}{m} heta_j ight] & j in lbrace 1,2...n brace ewline & brace end{align*}
]- 等价于
[
heta_j := heta_j(1 - alphafrac{lambda}{m}) - alphafrac{1}{m} sum_{i=1}^m(h_ heta(x^{(i)}) - y^{(i)})x_j^{(i)}
]
- 等价于
-
Normal Equation
[
egin{align*}& heta = left( X^TX + lambda cdot L ight)^{-1} X^Ty ewline& ext{where} L = egin{bmatrix} 0 & & & & ewline & 1 & & & ewline & & 1 & & ewline & & & ddots & ewline & & & & 1 ewlineend{bmatrix}end{align*}
]
- 对于不可逆的((X^TX)), ((X^TX + lambda.L)) 会可逆
Regularized Logistic Regression
- Cost Function
[
J( heta) = -frac{1}{m} sum_{i=1}^m[y^{(i)}log(h_ heta(x^{(i)})) + (1 - y^{(i)})log(1 - h_ heta(x^{(i)}))] + frac{lambda}{2m}sum_{j=1}^n heta_j^2
]
- Gradient descent
[
egin{align*} & ext{Repeat} lbrace ewline & heta_0 := heta_0 - alpha frac{1}{m} sum_{i=1}^m (h_ heta(x^{(i)}) - y^{(i)})x_0^{(i)} ewline & heta_j := heta_j - alpha left[ left( frac{1}{m} sum_{i=1}^m (h_ heta(x^{(i)}) - y^{(i)})x_j^{(i)} ight) + frac{lambda}{m} heta_j ight] & j in lbrace 1,2...n brace ewline & brace end{align*}
]