Logistic Regression

zoukankan html css js c++ java

Logistic Regression

Motivation

If y only takes a finite set of discrete values such as {0,1}, then using Linear Regression to predict a (hat y>1/hat y<0) does not make sense at all. But fortunately we can fix Linear Regression to produce a value between [0,1].

Details

We choose sigmoid/logistic function to map the value:

[h_ heta(x)=g( heta^Tx),g(z)=frac{1}{1+e^{-z}} ]

We can assume that:

[h_ heta(x)=P(y=1|x; heta)\ 1-h_ heta(x)=P(y=0|x; heta)]
Or more compactly:

[p(y|x; heta)=[h_ heta(x)]^y[1-h_ heta(x)]^{1-y} ]
Now we will use maximum likelihood to fit parameters ( heta), assume n training examples are independent, then the likelihood of the parameters is:

[L( heta)=p(vec y|X; heta)=prod_{i=1}^{n}p(y^{(i)}|x^{(i)}; heta)=prod_{i=1}^{n}[h(x^{(i)})]^{y^{(i)}}[1-h(x^{(i)})]^{1-y^{(i)}} ]
To make life easier, we use the log likelihood:

[l( heta)=log L( heta)=sum_{i=1}^{n}y^{(i)}log h(x^{(i)})+(1-y^{(i)})log (1-h(x^{(i)})) ]
Let's first take out one example ((x,y)) to derive the stochastic gradient ascent rule:

[frac{partial}{partial heta_j}l( heta)=[yfrac{1}{g( heta^Tx)}-(1-y)frac{1}{1-g( heta^Tx)}]frac{partial}{partial heta_j}g( heta^Tx) \=[yfrac{1}{g( heta^Tx)}-(1-y)frac{1}{1-g( heta^Tx)}]g( heta^Tx)(1-g( heta^Tx))frac{partial}{partial heta_j} heta^Tx \=[y(1-g( heta^Tx))-(1-y)g( heta^Tx)]x_j=(y-h_ heta(x))x_j ]
Then we can update the parameters:

[ heta_j= heta_j+alpha(y^{(i)}-h_{ heta}(x^{(i)}))x_j^{(i)} ]
Here we use maximum likelihood to get the update rule. Generally we would like to minimize the object function. So we can add a negative sign to the maximum likelihood's formula, it is called logistic loss. Thus there exists another way to understand it.

The loss on a single sample can be formulated as follows:

[cost(h_{ heta}(x),y)=left{ egin{aligned} -log(h_{ heta}(x)) if y=1\ -log(1-h_{ heta}(x)) if y=0 end{aligned} ight. ]
If y=1 and the prediction=1, then loss=0; else if y=1 and the prediction=0, then loss=(+infin) is a huge penalty for the totally wrong prediction. It is the same for y=0.

We can unify the two cases together and the loss for the whole training data is:

[cost((h_{ heta}(x),y))=-ylog(h_{ heta}(x))-(1-y)log(1-h_{ heta}(x))\=-frac{1}{m}sum_{i=1}^{m}[y^{(i)}log(h_{ heta}(x^{(i)}))+(1-y^{(i)})log(1-h_{ heta}(x^{(i)}))] ]
Here the reason why we don't use the MSE loss such as Linear Regression is that the (J( heta)) is non-convex and very hard to optimize for the global optimum.

To make life easier again, we can write the formula as the vectorized version:

[h = g(X heta),J( heta) = frac{1}{m} cdot left(-y^{T}log(h)-(1-y)^{T}log(1-h) ight) ]
Then our goal is to minimize (J( heta)) and get appropriate parameters ( heta) and use (h_ heta(x)=frac{1}{1+e^{- heta^Tx}}) to get our predictions.

Since it is a little complex to get answer analytically, so we still use Gradient Descent to minimize the loss numerically. The update rule is the same as the above one:

[ heta_j= heta_j+alphafrac{1}{m}sum_{i=1}^{m}(y^{(i)}-h_{ heta}(x^{(i)}))x_j^{(i)} ]
Here you should notice that all ( heta_j) should be updated simultaneously when you program. Again the vectorized version:

[ heta= heta-frac{alpha}{m}X^T[g(X heta)-y] ]
It is the same formula as the Linear Regression except that (h_ heta(x)) is different.

牛顿法

除了用梯度上升法去最大化(l( heta))，牛顿迭代法也能干这件事。

普通同学都是在求方程的零点(f( heta)=0)时接触到牛顿法，其更新规则为：

[ heta= heta-frac{f( heta)}{f^{'}( heta)} ]
这个规则可以理解为：我们一直在用一个线性函数去近似(f)，因此希望下一次迭代的( heta)就是该线性函数的零点：

再结合一点高中数学，(l( heta))极大值点处的一阶导数为0，因此只要令(l^{'}( heta)=0)就能解出对应的( heta)：

[ heta= heta-frac{l^{'}( heta)}{l^{''}( heta)} ]
由于逻辑回归中( heta)是向量而非scalar，因此需要稍稍改变下更新规则：

[ heta= heta-H^{-1} abla_{ heta}l( heta) ]
其中，Hessian阵中的元素为(H_{ij}=frac{partial^2l( heta)}{partial heta_ipartial heta_j})。

牛顿法通常比梯度上升收敛快得多，因为利用了(l( heta))的二阶信息，但是存储和求解(H^{-1})开销会比较大。

查看全文

相关阅读:
Java基础教程：对象比较排序
 算法：深度优先搜索
 微信小程序开发：学习笔记[5]——JavaScript脚本
 微信小程序开发：学习笔记[4]——样式布局
 设计模式：学习笔记(8)——装饰器模式
 设计模式：学习笔记(7)——原型模式
 设计模式——单例模式解析
 玩转Android之数据库框架greenDAO3.0使用指南
 Android开发工具——Android Studio调试技巧
 VR开发的烦恼——范围限制

原文地址：https://www.cnblogs.com/EIMadrigal/p/12130859.html

Logistic Regression

Motivation

Details

牛顿法