logistic回归是一种分类方法,用于两分类的问题,其基本思想为:
- 寻找合适的假设函数,即分类函数,用来预测输入数据的结果;
- 构造损失函数,用来表示预测的输出结果与训练数据中实际类别之间的偏差;
- 最小化损失函数,从而获得最优的模型参数。
首先来看一下sigmoid函数:
(g(x)=frac{1}{1-e^{x}})
它的函数图像为:
logistic回归中的假设函数(分类函数):
(h_{ heta }(x)=g( heta ^{T}x)=frac{1}{1+e^{- heta ^{T}x}})
解释:
( heta ) —— 我们在后面要求取的参数;
(T) —— 向量的转置,默认的向量都是列向量;
( heta ^{T}x) —— 列向量( heta)先转置,然后与(x)进行点乘,比如:
(egin{bmatrix}1\ -1\ 3end{bmatrix}^{T}egin{bmatrix}1\ 1\ -1end{bmatrix} = egin{bmatrix}1 & -1 & 3end{bmatrix}egin{bmatrix}1\ 1\ -1end{bmatrix}=1 imes 1+(-1) imes1+3 imes(-1) = -3)
logistic分类有线性边界和非线性边界两种:
线性边界形式为:( heta_{0}+ heta_{1}x_{1}+cdots+ heta_{n}x_{n}=sum_{i=0}^{n} heta_{i}x_{i}= heta^{T}x)
非线性边界的形式为:( heta_{0}+ heta_{1}x_{1}+ heta_{2}x_{2}+ heta_{3}x_{1}^{2}+ heta_{4}x_{2}^{2})
在概率上计算输入(x)结果为1或者0的概率分别为:
(P(y=1|x; heta)=h_{ heta}(x))
(P(y=0|x; heta)=1-h_{ heta}(x))
损失函数被定义为:(J( heta)=frac{1}{m}sum_{m}^{i=1}cost(h_{ heta}(x^{i}), y^{i}))
其中:
这里(m)是所有训练样本的数目;
(cost(h_{ heta}(x), y)=left{egin{matrix} -log(h_{ heta}(x)) if y=1\ -log(1-h_{ heta}(x)) if y=0end{matrix} ight.)
(cost)的另一种形式是:(cost(h_{ heta}(x), y)=-y imes log(h_{ heta}(x))-(1-y) imes log(1-h_{ heta}(x)))
将(cost)代入到(J( heta))中可以得到损失函数如下:
(J( heta)=-frac{1}{m}[sum_{m}^{i=1}y^{(i)}logh_{ heta}(x^{(i)})+(1-y^{(i)})log(1-h_{ heta}(x^{(i)}))])
梯度法求(J( heta))的最小值
( heta)的更新过程如下:
( heta_{j}:= heta_{j}-alphafrac{partial }{partial heta_{j}}J( heta), (j=0cdots n))
其中:(alpha)是学习步长。
(egin{align*} frac{partial }{partial heta_{j}}J( heta) &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{h_{ heta}(x^{(i)})} frac{partial }{partial heta_{j}}h_{ heta}(x^{(i)})-(1-y^{(i)})frac{1}{1-h_{ heta}(x^{(i)})}frac{partial }{partial heta_{j}}h_{ heta}(x^{(i)}) ight ) \ &=-frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{gleft ( heta^{T}x^{(i)} ight )}-left ( 1-y^{(i)} ight )frac{1}{1-gleft ( heta^{T}x^{(i)} ight )} ight )frac{partial }{partial heta_{j}}gleft ( heta^{T}x^{(i)} ight ) \ &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{gleft ( heta^{T}x^{(i)} ight )}-left ( 1-y^{(i)} ight )frac{1}{1-gleft ( heta^{T}x^{(i)} ight )} ight ) gleft ( heta^{T}x^{(i)} ight ) left ( 1-gleft ( heta^{T}x^{(i)} ight ) ight ) frac{partial }{partial heta_{j}} heta^{T}x^{(i)} end{align*})
(egin{align*} frac{partial }{partial heta_{j}}J( heta) &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}left ( 1-gleft ( heta^{T}x^{(i)} ight ) ight )-left ( 1-y^{(i)} ight )gleft ( heta^{T}x^{(i)} ight) ight )x_{j}^{left (i ight )} \ &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)} -gleft ( heta^{T}x^{(i)} ight) ight )x_{j}^{left (i ight )} \ &=-frac{1}{m}sum_{m}^{i=1}left ( y^{(i)} -h_{ heta}left ( x^{(i)} ight) ight )x_{j}^{left (i ight )} \&=frac{1}{m}sum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i ight )} end{align*})
把偏导代入更新过程那么可以得到:
( heta_{j}:= heta_{j}-alphafrac{1}{m}sum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i ight )})
学习步长(alpha)通常是一个常量,然后省去(frac{1}{m}),可以得到最终的更新过程:
( heta_{j}:= heta_{j}-alphasum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i ight )}, left ( j=0cdots n ight ))
向量化梯度
训练样本用矩阵来描述就是:
(X= egin{bmatrix} x^{(1)}\ x^{(2)}\ cdots \ x^{(m)}end{bmatrix}=egin{bmatrix} x_{0}^{(1)} & x_{1}^{(1)} & cdots & x_{n}^{(1)}\ x_{0}^{(2)} & x_{1}^{(2)} & cdots & x_{n}^{(2)}\ cdots & cdots & cdots & cdots \ x_{0}^{(m)} & x_{1}^{(m)} & cdots & x_{n}^{(m)} end{bmatrix}, Y=egin{bmatrix} y^{left ( 1 ight )}\ y^{left ( 2 ight )}\ cdots \ y^{left ( m ight )}end{bmatrix})
参数( heta)的矩阵形式为:
(Theta=egin{bmatrix} heta^{left ( 1 ight )}\ heta^{left ( 2 ight )}\ cdots \ heta^{left ( m ight )}end{bmatrix})
先计算(Xcdot Theta),并记结果为(A):
(A=XcdotTheta),其实就是矩阵的乘法
再来求取向量版的误差(E):
(E=h_{Theta}left ( X ight )-Y=egin{bmatrix} gleft ( A^{1} ight )-y^{left (1 ight )}\ gleft ( A^{1} ight )-y^{left (1 ight )}\ cdots \ gleft ( A^{1} ight )-y^{left (1 ight )}end{bmatrix} = egin{bmatrix} e^{(1)}\ e^{(2)}\ cdots \ e^{(m)}end{bmatrix})
当(j=0)时的更新过程为:
(egin{align*} heta_{0}&= heta_{0}-alphasum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{0}^{left (i ight )}, left ( j=0cdots n ight ) \ &= heta_{0}-alphasum_{m}^{i=1}e^{left ( i ight )}x_{0}^{left ( i ight )} \ &= heta_{0}-alpha egin{bmatrix} x_{0}^{left ( 1 ight )} & x_{0}^{left ( 2 ight )} & cdots & x_{m}^{left ( 0 ight )} end{bmatrix} cdot E end{align*})
对于( heta_{j})同理可以得到:
( heta_{j} = heta_{j}-alpha egin{bmatrix} x_{j}^{left ( 1 ight )} & x_{j}^{left ( 2 ight )} & cdots & x_{j}^{left ( m ight )} end{bmatrix} cdot E)
用矩阵来表达就是:
(egin{align*}egin{bmatrix} heta_{0}\ heta_{1}\ cdots \ heta_{n}end{bmatrix} &= egin{bmatrix} heta_{0}\ heta_{1}\ cdots \ heta_{n}end{bmatrix} - alpha cdot egin{bmatrix} x_{0}^{left ( 1 ight )} & x_{0}^{left ( 2 ight )} & cdots & x_{0}^{left (m ight )}\ x_{1}^{left ( 1 ight )} & x_{1}^{left ( 2 ight )} & cdots & x_{1}^{left (m ight )}\ cdots & cdots & cdots & cdots\ x_{n}^{left ( 1 ight )} & x_{n}^{left ( 2 ight )} & cdots & x_{n}^{left (m ight )}\ end{bmatrix} cdot E \ &= heta - alpha cdot x^{T} cdot E end{align*})
以上就三个步骤:
1. 求取模型的输出:(A=X cdot Theta)
2. sigmoid映射之后求误差:(E=gleft ( A ight )-Y)
3. 利用推导的公式更新(Theta),(Theta:=Theta-alpha cdot X^{T} cdot E),然后继续回到第一步继续。