基本思想
求出这样一些未知参数使得样本点和拟合线的总误差(距离)最小
最直观的感受如下图(图引用自知乎某作者)
而这个误差(距离)可以直接相减,但是直接相减会有正有负,相互抵消了,所以就用差的平方
推导过程
1 写出拟合方程
(y = a+bx)
2 现有样本((x_1, y_1),(x_2, y_2)...(x_n, y_n))
3 设(d_i)为样本点到拟合线的距离,即误差
(d_i=y_i-(a+bx_i))
4 设(D)为差方和(为什么要取平方前面已说,防止正负相互抵消)
(D=sumlimits_{i=1}^{n}d_i^2=sumlimits_{i=1}^{n}(y_i-a-bx_i)^2)
5 根据一阶导数等于0,二阶大于等于0(证明略)求出未知参数
对a求一阶偏导
(
egin{aligned}
frac{partial D}{partial a}
&=sumlimits_{i=1}^{n}2(y_i-a-bx_i)(-1)\
&=-2sumlimits_{i=1}^{n}(y_i-a-bx_i)\
end{aligned}
)
(
egin{aligned}
&=-2(sumlimits_{i=1}^{n}y_i-sumlimits_{i=1}^{n}a-bsumlimits_{i=1}^{n}x_i)\
&=-2(nar{y}-na-nbar{x})
end{aligned}
)
对b求一阶偏导
(
egin{aligned}
frac{partial D}{partial b}
&=sumlimits_{i=1}^{n}2(y_i-a-bx_i)(-x_i)\
&=-2sumlimits_{i=1}^{n}(x_iy_i-ax_i-bx_i^2)\
end{aligned}
)
(
egin{aligned}
&=-2(sumlimits_{i=1}^{n}x_iy_i-asumlimits_{i=1}^{n}x_i-bsumlimits_{i=1}^{n}x_i^2)\
&=-2(sumlimits_{i=1}^{n}x_iy_i-naar{x}-bsumlimits_{i=1}^{n}x_i^2)
end{aligned}
)
令偏导等于0得
(-2(nar{y}-na-nbar{x})=0)
(=> color{red}{a=ar{y}-bar{x}})
(-2(sumlimits_{i=1}^{n}x_iy_i-naar{x}-bsumlimits_{i=1}^{n}x_i^2)=0)并将(a=ar{y}-bar{x})带入化简得
(=>sumlimits_{i=1}^{n}x_iy_i-nar{x}ar{y}+nbar{x}^2-bsumlimits_{i=1}^{n}x_i^2=0)
(=>sumlimits_{i=1}^{n}x_iy_i-nar{x}ar{y}=b(sumlimits_{i=1}^{n}x_i^2-nar{x}^2))
(=>b=frac{sumlimits_{i=1}^{n}x_iy_i-nar{x}ar{y}}{sumlimits_{i=1}^{n}x_i^2-nar{x}^2})
因为(
equire{cancel}sumlimits_{i=1}^{n}(x_i-ar{x})(y_i-ar{y})=sumlimits_{i-1}^{n}(x_iy_i-ar{x}y_i-x_iar{y}+ar{x}ar{y})=sumlimits_{i=1}^{n}x_iy_i-nar{x}ar{y}-cancel{nar{x}ar{y}}+cancel{nar{x}ar{y}})
(sumlimits_{i=1}^{n}(x_i-ar{x})^2=sumlimits_{i-1}^{n}(x_i^2-2ar{x}x_i+ar{x}^2)=sumlimits_{i=1}^{n}x_i^2-2nar{x}^2+nar{x}^2=sumlimits_{i=1}^{n}x_i^2-nar{x}^2)
所以将其带入上式得(color{red}{b=frac{sumlimits_{i=1}^{n}(x_i-ar{x})(y_i-ar{y})}{sumlimits_{i=1}^{n}(x_i-ar{x})^2}})