本篇文章主要针对《周志华西瓜书》、《南瓜书》的笔记总结,思路梳理。
线性模型:属性的线性组合
[f(oldsymbol{x})=w_{1} x_{1}+w_{2} x_{2}+ldots+w_{d} x_{d}+b=omega^Tx+b
]
蕴含的基本思想:
数据集的表示:
- 数据集 (D = {(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)},dots,(x^{(i)},y^{(i)}))}) 用右上角代表样本
- 对离散属性的处理
- 有序属性:插值离散化 (例如:高,中,低,可转化为 {1.0,0.5,0.0} )
- 无序属性:one-hot 化(例如:西瓜,南瓜,冬瓜,可转化为 (0,0,1),(0,1,0),(1,0,0) )
单元线性回归:
对于回归问题:目标就是让 (Loss function) 最小化
(Loss function:) 最小二乘误差
[egin{aligned}left(w^{*}, b^{*}
ight) &=underset{(w, b)}{arg min } sum_{i=1}^{m}left(fleft(x_{i}
ight)-y_{i}
ight)^{2} \ &=underset{(w, b)}{arg min } sum_{i=1}^{m}left(y_{i}-w x_{i}-b
ight)^{2} end{aligned}
]
先对(b)求偏导:
[2sum_{i=1}^m(y_i-omega x_i-b)(-1)=0\
Rightarrow b={} frac{1}{m} sum_{i=1}^{m}left(y_{i}-w x_{i}
ight)
]
再对(omega)求偏导:
[egin{aligned} 0 &=w sum_{i=1}^{m} x_{i}^{2}-sum_{i=1}^{m}left(y_{i}-b
ight) x_{i} \
Rightarrow & w sum_{i=1}^{m} x_{i}^{2}sum_{i=1}^{m} y_{i} x_{i}-sum_{i=1}^{m} b x_{i} end{aligned}\
]
将(b)代入上式中
[Rightarrow w sum_{i=1}^{m} x_{i}^{2}=sum_{i=1}^{m} y_{i} x_{i}-sum_{i=1}^{m}(ar{y}-w ar{x}) x_{i}\
Rightarrow wleft(sum_{i=1}^{m} x_{i}^{2}-ar{x} sum_{i=1}^{m} x_{i}
ight)=sum_{i=1}^{m} y_{i} x_{i}-ar{y} sum_{i=1}^{m} x_{i} \
Rightarrow w=frac{sum_{i=1}^{m} y_{i} x_{i}-ar{y} sum_{i=1}^{m} x_{i}}{sum_{i=1}^{m} x_{i}^{2}-ar{x} sum_{i=1}^{m} x_{i}}
]
由以下两个等式可以转换为西瓜书上的公式:【技巧】
[egin{aligned}
ar{y} sum_{i=1}^{m} x_{i} ={} & frac{1}{m} sum_{i=1}^{m} y_{i} sum_{i=1}^{m} x_{i}=ar{x} sum_{i=1}^{m} y_{i} \
ar{x} sum_{i=1}^{m} x_{i} ={} & frac{1}{m} sum_{i=1}^{m} x_{i} sum_{i=1}^{m} x_{i}=frac{1}{m}left(sum_{i=1}^{m} x_{i}
ight)^{2}
end{aligned}
]
最终可得:
[Rightarrow w=frac{sum_{i=1}^{m} y_{i}left(x_{i}-ar{x}
ight)}{sum_{i=1}^{m} x_{i}^{2}-frac{1}{m}left(sum_{i=1}^{m} x_{i}
ight)^{2}}
]
可以求解得:(omega) 和 (b) 最优解的闭式解
[egin{aligned}
w={} &frac{sum_{i=1}^{m} y_{i}left(x_{i}-ar{x}
ight)}{sum_{i=1}^{m} x_{i}^{2}-frac{1}{m}left(sum_{i=1}^{m} x_{i}
ight)^{2}}\
b={} &frac{1}{m} sum_{i=1}^{m}left(y_{i}-w x_{i}
ight)
end{aligned}
]
进一步的可以将(omega)向量化【方便编程】
将(frac{1}{m}left(sum_{i=1}^{m} x_{i}
ight)^{2}=ar{x} sum_{i=1}^{m} x_{i})代入分母可得:
[egin{aligned} w &=frac{sum_{i=1}^{m} y_{i}left(x_{i}-ar{x}
ight)}{sum_{i=1}^{m} x_{i}^{2}-ar{x} sum_{i=1}^{m} x_{i}} \ &=frac{sum_{i=1}^{m}left(y_{i} x_{i}-y_{i} ar{x}
ight)}{sum_{i=1}^{m}left(x_{i}^{2}-x_{i} ar{x}
ight)} end{aligned}
]
由以下两个等式:【技巧】
[ar{y} sum_{i=1}^{m} x_{i}=ar{x} sum_{i=1}^{m} y_{i}=sum_{i=1}^{m} ar{y} x_{i}=sum_{i=1}^{m} ar{x} y_{i}=m ar{x} ar{y}=sum_{i=1}^{m} ar{x} ar{y} \
sum_{i=1}^mx_iar{x}=ar{x} sum_{i=1}^{m} x_{i}=ar{x} cdot m cdot frac{1}{m} cdot frac{1}{m} cdot sum_{i=1}^{m} x_{i}=m ar{x}^{2}=sum_{i=1}^{m} ar{x}^{2}
]
可将(omega)的表达式化为:
[egin{aligned} w &=frac{sum_{i=1}^{m}left(y_{i} x_{i}-y_{i} ar{x}-x_{i} ar{y}+ar{x} ar{y}
ight)}{sum_{i=1}^{m}left(x_{i}^{2}-x_{i} ar{x}-x_{i} ar{x}+ar{x}^{2}
ight)} \ &=frac{sum_{i=1}^{m}left(x_{i}-ar{x}
ight)left(y_{i}-ar{y}
ight)}{sum_{i=1}^{m}left(x_{i}-ar{x}
ight)^{2}} end{aligned}
]
令(oldsymbol{x}_{d}=left(x_{1}-ar{x}, x_{2}-ar{x}, ldots, x_{m}-ar{x}
ight)^{T}),(oldsymbol{y}_{d}=left(y_{1}-ar{y}, y_{2}-ar{y}, dots, y_{m}-ar{y}
ight)^{T})
向量化结果为:
[w=frac{oldsymbol{x}_{d}^{T} oldsymbol{y}_{d}}{oldsymbol{x}_{d}^{T} oldsymbol{x}_{d}}
]
多元线性回归
推导过程:对于单元线性回归,我们是先对(b)求偏导,再将(b)代入对(omega)的偏导中
而对于多元无法通过偏导直接求解出(Loss function)的极值,改写表达式
[f(x_{i})=omega^T x_i +b = eta^T X
]
其中
[eta = (omega,b) \
X = (x_i,1)
]
详细推导如下:(手写懒得手打了)
将损失函数求导:
对于上式中若((X^TX))是满秩矩阵或正定矩阵,则存在逆矩阵
例如,生物信息学的基因芯片数据中常有成千上万个属性,但样例仅为几十或上百。此时可解出多个(hat{omega}),它们都能使均方误差最小化。常见的做法是引入正则项解决这问题。
高阶认知(来源于白板推导)
几何角度[LSE]
概率角度:(即从模型生成角度考虑问题)