[C3] 正则化(Regularization)

zoukankan html css js c++ java

[C3] 正则化(Regularization)
正则化（Regularization - Solving the Problem of Overfitting）

欠拟合(高偏差) VS 过度拟合(高方差)

Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data.
It is usually caused by a function that is too simple or uses too few features.
欠拟合(高偏差)：没有很好的拟合训练集数据；

At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data.
It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
过度拟合(高方差)：可以很好的拟合训练集数据，但是函数太过庞大，变量太多，且缺少足够多的数据约束该模型(m < n)，无法泛化到新的数据样本。

This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting:
两种方法解决过度拟合：
1. Reduce the number of features
- Manually select which features to keep
- Use a model selection algorithm (studied later in the course)
1. Regularization
- Keep all the features, but reduce the magnitude of parameters ( heta_j).
- Regularization works well when we have a lot of slightly useful features.
正则化 - 线性回归代价函数

所有正则化均不包括 ( heta_0) 项

(J( heta)=frac{1}{2m} Bigg[ sumlimits_{i=1}^m Big( h_ heta(x^{(i)}) - y^{(i)} Big)^2 + lambda sumlimits_{j=1}^n heta_j^2 Bigg])

向量化表示为（A vectorized implementation is）：

(overrightarrow{h}=g(X overrightarrow{ heta}))

(J( heta)=frac{1}{2m} cdot Bigg[ (overrightarrow{h}-overrightarrow{y})^T cdot (overrightarrow{h}-overrightarrow{y}) + lambda cdot (overrightarrow{l} cdot overrightarrow{ heta}^{.2}) Bigg])

(overrightarrow{l} = [0, 1, 1, ...1])

代码实现：
```
m = length(y);

l = ones(1, length(theta)); l(:,1) = 0;
J = 1/(2*m) * ((X * theta - y)' * (X * theta - y) + lambda * (l * (theta.^2));

or 

J = 1/(2*m) * ((X * theta - y)' * (X * theta - y) + lambda * (theta'*theta - theta(1,:).^2);
```
正则化 - 逻辑回归代价函数

所有正则化均不包括 ( heta_0) 项

(J( heta)=-frac{1}{m} sumlimits_{i=1}^m Bigg[ y^{(i)} cdot log igg(h_ heta(x^{(i)}) igg) + (1-y^{(i)}) cdot log igg(1-h_ heta(x^{(i)}) igg) Bigg] + frac{lambda}{2m} sumlimits_{j=1}^n heta_j^2)

向量化表示为（A vectorized implementation is）：

(overrightarrow{h}=g(X overrightarrow{ heta}))

(J( heta)=frac{1}{m} cdot Big( -overrightarrow{y}^T cdot log(overrightarrow{h}) - (1- overrightarrow{y})^T cdot log(1- overrightarrow{h}) Big) + frac{lambda}{2m} (overrightarrow{l} cdot overrightarrow{ heta}^{.2}))

(overrightarrow{l} = [0, 1, 1, ...1])

代码实现：
```
m = length(y);

l = ones(1, length(theta)); l(:,1) = 0;
J = (1/m)*(-y'*log(sigmoid(X*theta))-(1 - y)'* log(1-sigmoid(X*theta))) + ...
    (lambda/(2*m))*(l*(theta.^2)); 

or 

J = (1/m)*(-y'*log(sigmoid(X*theta))-(1 - y)'* log(1-sigmoid(X*theta))) + ...
    (lambda/(2*m))*(theta'*theta - theta(1,:).^2);
```
正则化后的线性回归和逻辑回归梯度下降

所有正则化均不包括 ( heta_0) 项

(egin{cases} heta_0:= heta_0 - alpha frac{1}{m} sumlimits_{i=1}^m Big( h_ heta(x^{(i)}) - y^{(i)} Big) cdot x_0^{(i)} \ \ heta_j:= heta_j - alpha Bigg[ frac{1}{m} sumlimits_{i=1}^m Big( h_ heta(x^{(i)}) - y^{(i)} Big) cdot x_j^{(i)} + frac{lambda}{m} cdot heta_j Bigg] end{cases})

向量化表示为（A vectorized implementation is）：

(frac{1}{m} cdot Big( X^T cdot (overrightarrow{h} - overrightarrow{y}) Big) + frac{lambda}{m} cdot heta^{'})

( heta^{'} = egin{bmatrix} 0\[0.3em] heta_1\[0.3em] heta_2\[0.3em].\[0.3em].\[0.3em].\[0.3em] heta_n end{bmatrix})

代码实现：
```
reg_theta=theta; reg_theta(1, :) = 0;
grad = (1/m)*(X'*(sigmoid(X*theta) - y)) + (lambda/m)*reg_theta;
```
最终形式：对 ( heta_j) 的梯度下降公式进行整理变形（With some manipulation our update rule can also be represented as）：

(egin{cases} heta_0:= heta_0 - alpha frac{1}{m} sumlimits_{i=1}^m Big( h_ heta(x^{(i)}) - y^{(i)} Big) cdot x_0^{(i)} \ \ heta_j:= heta_j (1- alpha frac{lambda}{m}) - alpha frac{1}{m} sumlimits_{i=1}^m Big( h_ heta(x^{(i)}) - y^{(i)} Big) cdot x_j^{(i)} end{cases})

对线性回归正规方程进行正则化

所有正则化均不包括 ( heta_0) 项

(1 - alphafrac{lambda}{m}) will always be less than 1. Intuitively you can see it as reducing the value of ( heta_j) by some amount on every update. Notice that the second term is now exactly the same as it was before.

Now let's approach regularization using the alternate method of the non-iterative normal equation.

To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:

原始形态 (overrightarrow{ heta} = (X^TX)^{-1}X^T overrightarrow{y})

正则化后 (overrightarrow{ heta} = (X^TX + lambda L)^{-1}X^T overrightarrow{y})

(L = egin{bmatrix} 0&&&&&&\[0.3em]&1&&&&&\[0.3em]&&1&&&&\[0.3em]&&&·&&&\[0.3em]&&&&·&&\[0.3em]&&&&&·&\[0.3em]&&&&&&1end{bmatrix})

L is a matrix with 0 at the top left and 1's down the diagonal, with 0's everywhere else. It should have dimension (n+1)×(n+1).

Intuitively, this is the identity matrix (though we are not including (x_0)）multiplied with a single real number (lambda).

Recall that if m < n, then (X^TX) is non-invertible. However, when we add the term (lambda⋅L), then (X^TX + lambda⋅L) becomes invertible.

程序代码

正则化的特性已经全部添加到了其他练习代码中，如线性回归，逻辑回归，神经网络等。可在其他练习中查看到，如需非正则化，只要将Lambda=0即可。

获取源码以其他文件，可点击右上角 Fork me on GitHub 自行 Clone。
查看全文

相关阅读:
C#.NET 大型通用信息化系统集成快速开发平台 4.1 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.1 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.1 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.0 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.0 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.0 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.0 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.0 版本
 C#.NET 大型通用信息化系统集成快速开发平台 4.0 版本
 通用用户权限管理系统组件4.0 版本

原文地址：https://www.cnblogs.com/kershaw/p/10890774.html

[C3] 正则化(Regularization)

正则化（Regularization - Solving the Problem of Overfitting）

欠拟合(高偏差) VS 过度拟合(高方差)

正则化 - 线性回归代价函数

正则化 - 逻辑回归代价函数

正则化后的线性回归和逻辑回归梯度下降

对线性回归正规方程进行正则化

程序代码