样本((x_{i}),(y_{i}))个数为(m):
[{x_{1},x_{2},x_{3}...x_{m}}
]
[{y_{1},y_{2},y_{3}...y_{m}}
]
其中(x_{i})为(n-1)维向量(在最后添加一个1,和(w)的维度对齐,用于向量相乘):
[x_{i}={x_{i1},x_{i2},x_{i3}...x_{i(n-1)},1}
]
[y_{i}in{0,1}
]
其中(w)为(n)维向量:
[w={w_{1},w_{2},w_{3}...w_{n}}
]
回归函数:
[h_{w}(x_{i})=frac{1}{1+e^{wx_{i}}}
]
概率分布:
[P(y=1|x;w)=h_{w}(x)
]
[P(y=0|x;w)=1-h_{w}(x)
]
[P(y|x;w)=h_{w}(x)^{y}*(1-h_{w}(x))^{1-y}
]
极大似然函数:
[L(w)=prod_{i=1}^{m}P(y_{i}|x_{i};w)
=prod_{i=1}^{m}h_{w}(x_{i})^{y_{i}}*(1-h_{w}(x_{i}))^{1-y_{i}}
]
函数两边取对数:
[lnL(w)=sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i}))
]
[求w->max_{lnL(w)}
]
损失函数:
[J(w)=-frac{1}{m}*sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i}))
]
[求w->min_{J(w)}
]
损失函数对(w)中的每个(w_{j})求偏导数(梯度下降求最小值):
[frac{partial J(w)}{partial w_{j}}=frac{partial}{partial w_{j}}-frac{1}{m}*sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i}))
]
[=-frac{1}{m}*sum_{i=1}^{m}frac{y_{i}}{h_{w}(x_{i})}*frac{partial h_{w}(x_{i})}{partial w_{j}}+frac{1-y_{i}}{1-h_{w}(x_{i})}*frac{partial (1-h_{w}(x_{i}))}{partial w_{j}}
]
[=-frac{1}{m}*sum_{i=1}^{m}(frac{y_{i}}{h_{w}(x_{i})}-frac{1-y_{i}}{1-h_{w}(x_{i})})*frac{partial h_{w}(x_{i})}{partial w_{j}}
]
[=-frac{1}{m}*sum_{i=1}^{m}(frac{y_{i}}{h_{w}(x_{i})}-frac{1-y_{i}}{1-h_{w}(x_{i})})*frac{partial h_{w}(x_{i})}{partial wx_{i}}*frac{partial wx_{i}}{partial w_{j}}
]
[=-frac{1}{m}*sum_{i=1}^{m}(frac{y_{i}}{h_{w}(x_{i})}-frac{1-y_{i}}{1-h_{w}(x_{i})})*h_{w}(x_{i})*(1-h_{w}(x_{i}))*frac{partial wx_{i}}{partial w_{j}}
]
[=frac{1}{m}*sum_{i=1}^{m}(h_w(x_{i})-y_{i})*x_{ij}
]
更新(w)中的每个(w_{j})的值,其中(alpha)为学习速度:
[w_{j}:=w_{j}-alpha*frac{partial J(w)}{partial w_{j}}
]
批量梯度下降:使用所有样本值进行更新(w)中的每个(w_{j})的值
[w_{j}:=w_{j}-alpha*frac{1}{m}*sum_{i=1}^{m}(h_{w}(x_{i})-y_{i})*x_{ij}
]