zoukankan      html  css  js  c++  java
  • 贝叶斯线性回归

    贝叶斯线性回归

    问题背景:

    为了与PRML第一章一致,我们假定数据出自一个高斯分布:

    [p(t|x,mathbf{w},eta)=mathcal{N}(t|y(x,mathbf{w}),eta^{-1})=sqrt{frac{eta}{2pi}}exp(-frac{eta}{2}(t-y(x,mathbf{w}))^2) ]

    其中(eta)是精度,(y(x,mathbf{w})=sumlimits_{j=0}^Mw_jx^j)
    (mathbf{w})的先验为:

    [p(mathbf{w})=mathcal{N}(mathbf{w}|mathbf{0},alpha^{-1}mathbf{I})=(frac{alpha}{2pi})^{(M+1)/2}exp(-frac{alpha}{2}mathbf{w}^Tmathbf{w}) ]

    其中(alpha)是高斯分布的精度。
    为了表示方便,我们定义一个变换(phi(x)=(1,x,x^2,...,x^M)^T),那么(y(x,mathbf{w})=mathbf{w}^Tphi(x))。为了对(mathbf{w})作推断,我们需要收集数据更新先验分布,记收集到的数据为(mathbf{x}_N={x_1,...,x_N})(mathbf{t}_N={t_1,...,t_N}),其中(t_i)(x_i)对应的响应。进一步我们引入一个矩阵(Phi),其定义如下:

    [Phi=egin{bmatrix}phi(x_1)^T\phi(x_2)^T\vdots\phi(x_N)^Tend{bmatrix} ]

    我们可以认为这个矩阵是由(phi(x_i)^T)平铺而成。

    详细推导过程

    首先我们先验证(p(mathbf{w}|mathbf{x},mathbf{t}))是个高斯分布。根据贝叶斯公式我们有:

    [egin{aligned}p(mathbf{w}|mathbf{x},mathbf{t})&propto p(mathbf{w}|alpha)p(mathbf{t}|mathbf{x},mathbf{w})\&=(frac{alpha}{2pi})^{(M+1)/2} exp(-frac{alpha}{2}mathbf{w}^Tmathbf{w})prod_{i=1}^Nsqrt{frac{eta}{2pi}}exp(-frac{eta}{2}(t_i-mathbf{w}^Tphi(x_i))^2)\&propto exp(-frac{1}{2}Big{alpha mathbf{w}^Tmathbf{w}+etasum_{i=1}^N(t_i^2-2t_imathbf{w}^Tphi(x_i)+mathbf{w}^Tphi(x_i)phi(x_i)^Tmathbf{w})Big})\&propto exp(-frac{1}{2}Big{ mathbf{w}^T(alpha mathbf{I}+etasum_{i=1}^Nphi(x_i)phi(x_i)^T)mathbf{w}-2etasum_{i=1}^N t_iphi(x_i)^Tmathbf{w}Big})\&propto exp(-frac{1}{2}Big{mathbf{w}^T(alpha mathbf{I}+eta Phi^TPhi)mathbf{w} -2eta(Phi^Tmathbf{t}_N)^Tmathbf{w}Big})end{aligned} ]

    (S^{-1}=alpha mathbf{I}+etaPhi^TPhi),(mu=eta SPhi^Tmathbf{t}_N)(注:原书中式1.72写错了)

    [p(mathbf{w}|mathbf{x},mathbf{t})propto exp(-frac{1}{2}(mathbf{w}-mu)^TS^{-1}(mathbf{w}-mu))propto mathcal{N}(mathbf{w}|mu,S) ]

    至此,我们证明了后验分布也是个高斯,接下来我们计算predictive distribution,注意到(p(t|x,mathbf{x},mathbf{t}))是两个高斯分布的卷积,其结果也是一个高斯,但为了严谨起见,还是证明一下。

    [egin{aligned}p(t|x,mathbf{x},mathbf{t})&=int p(t|x,mathbf{w})p(mathbf{w}|mathbf{x},mathbf{t})dmathbf{w}\&=frac{1}{(2pi)^{M/2+1}}frac{1}{(eta^{-1}|S|)^{1/2}} int exp(-frac{1}{2}Big{eta(t-mathbf{w}^Tphi(x))^2+(mathbf{w}-mu)^TS^{-1}(mathbf{w}-mu)Big})dmathbf{w}\&=frac{1}{(2pi)^{M/2+1}}frac{1}{(eta^{-1}|S|)^{1/2}}intexp(-frac{1}{2}Big{eta t^2-2eta tphi(x)^Tmathbf{w}+etamathbf{w}^Tphi(x)phi(x)^Tmathbf{w}\&+mathbf{w}^TS^{-1}mathbf{w}-2mu^T S^{-1}mathbf{w}+mu^T S^{-1}muBig})dmathbf{w}\&=frac{1}{(2pi)^{M/2+1}}frac{1}{(eta^{-1}|S|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu))cdot \&intexp(-frac{1}{2}Big{-2eta tphi(x)^Tmathbf{w}+etamathbf{w}^Tphi(x)phi(x)^Tmathbf{w}+mathbf{w}^TS^{-1}mathbf{w}-2mu^T S^{-1}mathbf{w}Big})dmathbf{w}\&=frac{1}{(2pi)^{M/2+1}}frac{1}{(eta^{-1}|S|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu))cdot\&intexp(-frac{1}{2}Big{mathbf{w}^T(underbrace{etaphi(x)phi(x)^T+S^{-1})}_{Lambda^{-1}}mathbf{w}-2(underbrace{eta tphi(x)^T+mu^TS^{-1}}_{m^TLambda^{-1}})mathbf{w}Big}dmathbf{w}\&=frac{1}{(2pi)^{M/2+1}}frac{1}{(eta^{-1}|S|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu))cdot\&intexp(-frac{1}{2}Big{mathbf{w}^TLambda^{-1}mathbf{w}-2m^TLambda^{-1}mathbf{w}+m^TLambda^{-1}m-m^TLambda^{-1}mBig})dmathbf{w}\&=frac{1}{(2pi)^{M/2+1}}frac{1}{(eta^{-1}|S|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))cdot\&intexp(-frac{1}{2}Big{(mathbf{w}-m)^TLambda^{-1}(mathbf{w}-m)Big})dmathbf{w}\&=frac{(2pi)^{(M+1)/2}}{(2pi)^{M/2+1}}frac{|Lambda|^{1/2}}{(eta^{-1}|S|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))\&=frac{1}{(2pi)^{1/2}}frac{|(etaphi(x)phi(x)^T+S^{-1})^{-1}|^{1/2}}{(eta^{-1}|S|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))\&=frac{1}{(2pi)^{1/2}}frac{1}{(eta^{-1}|S||etaphi(x)phi(x)^T+S^{-1}|)^{1/2}}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))end{aligned} ]

    到这里不知道怎么推下去了,于是去网上闲逛找解决办法,终于找到了一篇论文《Modeling Inverse Covariance Matrices by Basis Expansion》这篇论文里介绍了一个引理
    引理 (对称矩阵的秩1扰动)(alphainmathbb{R}),(mathbf{a}inmathbb{R}^d),(Pinmathbb{R}^{d imes d}) 为可逆矩阵。如果(alpha eqmathbf{a} -(mathbf{a}^TPmathbf{a})^{-1})那么秩1扰动矩阵(P+alpha mathbf{a} mathbf{a}^T)可逆,且

    [(P+alpha mathbf{a} mathbf{a}^T)^{-1}=P^{-1}-frac{alpha P^{-1}mathbf{a}mathbf{a}^T P^{-1}}{1+alpha mathbf{a}^TP^{-1}mathbf{a}} ]

    [det(P+alpha mathbf{a} mathbf{a}^T)=(1+alpha mathbf{a}^T P^{-1}mathbf{a})det(P) ]

    这条定理说的是如果我们给协方差矩阵一个秩为1的向量外积做扰动,我们可以将扰动后的矩阵的逆和行列式进行展开。具体地,我们考察(|etaphi(x)phi(x)^T+S^{-1}|),发现

    [|etaphi(x)phi(x)^T+S^{-1}|=(1+eta phi(x)^T Sphi(x))det(S^{-1})=(1+eta phi(x)^T Sphi(x))/|S| ]

    于是

    [frac{1}{(eta^{-1}|S||etaphi(x)phi(x)^T+S^{-1}|)^{1/2}}=frac{1}{(eta^{-1}|S|cdot frac{(1+eta phi(x)^T Sphi(x))}{|S|})^{1/2}}=frac{1}{(eta^{-1}+phi(x)^T Sphi(x))^{1/2}} ]

    接下来考察指数部分(exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))),注意到(mu=eta SPhi^Tmathbf{t}_N),于是(mu^TS^{-1}mu=eta^2(Phi^Tmathbf{t}_N)^TS(Phi^Tmathbf{t}_N))。同时,应用上述引理我们有

    [Lambda=(etaphi(x)phi(x)^T+S^{-1})^{-1}=S-frac{eta Sphi(x)phi(x)^TS}{1+eta phi(x)^TSphi(x)}=S-frac{ Sphi(x)phi(x)^TS}{eta^{-1}+phi(x)^TSphi(x)} ]

    利用以上两个关系,我们进一步进行推导

    [egin{aligned}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))&=exp(-frac{1}{2}Big{eta t^2+eta^2(Phi^Tmathbf{t}_N)^TS(Phi^Tmathbf{t}_N)-(eta tphi(x)^T+eta (Phi^Tmathbf{t}_N)^T)LambdaLambda^{-1}Lambda(eta tphi(x)+eta Phi^Tmathbf{t}_N)Big})\&=exp(-frac{1}{2}Big{eta t^2+eta^2(Phi^Tmathbf{t}_N)^TS(Phi^Tmathbf{t}_N)-ig[eta^2 t^2phi(x)^TLambdaphi(x)+2eta^2 tphi(x)^TLambda (Phi^Tmathbf{t}_N)+eta^2 (Phi^Tmathbf{t}_N)^TLambda(Phi^Tmathbf{t}_N)ig]Big})\&= exp(-frac{1}{2}Big{eta t^2+eta^2(Phi^Tmathbf{t}_N)^TS(Phi^Tmathbf{t}_N)+ig[-eta^2 t^2phi(x)^TSphi(x)+eta^3 t^2 frac{(phi(x)^TSphi(x))^2}{1+etaphi(x)^TSphi(x)}\ &-2eta^2 tphi(x)^TS(Phi^Tmathbf{t}_N)+2eta^3 tfrac{phi(x)^Tphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)}{1+etaphi(x)^TSphi(x)}\&-eta^2(Phi^Tmathbf{t}_N)^TS(Phi^Tmathbf{t}_N)+eta^3 frac{(Phi^Tmathbf{t}_N)^TSphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)}{1+etaphi(x)^TSphi(x)}ig]Big})\&=exp(-frac{1}{2}Big{ig(eta-eta^2phi(x)^TSphi(x)+eta^3 frac{(phi(x)^TSphi(x))^2}{1+etaphi(x)^TSphi(x)}ig)t^2\ &+2ig(eta^3 frac{phi(x)^Tphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)}{1+etaphi(x)^TSphi(x)}-eta^2phi(x)^TS(Phi^Tmathbf{t}_N)ig)t+eta^3 frac{(Phi^Tmathbf{t}_N)^TSphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)}{1+etaphi(x)^TSphi(x)}Big})end{aligned}]

    我们考察每一个系数,首先是(t^2)的系数

    [egin{aligned} eta-eta^2phi(x)^TSphi(x)+eta^3 frac{(phi(x)^TSphi(x))^2}{1+etaphi(x)^TSphi(x)}&=eta+frac{-eta^2phi(x)^TSphi(x)[1+etaphi(x)^TSphi(x)]+eta^3 (phi(x)^TSphi(x))^2}{1+etaphi(x)^TSphi(x)}\&=eta-frac{eta^2phi(x)^TSphi(x)}{1+eta phi(x)^TSphi(x)}\&=frac{eta(1+eta phi(x)^TSphi(x))-eta^2phi(x)^TSphi(x)}{1+etaphi(x)^TSphi(x)}\&=frac{eta}{1+etaphi(x)^TSphi(x)}=frac{1}{eta^{-1}+phi(x)^TSphi(x)}end{aligned} ]

    接着是(t)的系数

    [egin{aligned}eta^3 frac{phi(x)^Tphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)}{1+etaphi(x)^TSphi(x)}-eta^2phi(x)^TS(Phi^Tmathbf{t}_N)&=frac{eta^3phi(x)^Tphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)-eta^2phi(x)^TS(Phi^Tmathbf{t}_N)(1+etaphi(x)^TSphi(x))}{1+etaphi(x)^TSphi(x)}\&=frac{eta^3phi(x)^Tphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)-eta^2phi(x)^TS(Phi^Tmathbf{t}_N)-eta^3phi(x)^TS(Phi^Tmathbf{t}_N)phi(x)^TSphi(x)}{1+etaphi(x)^TSphi(x)}\&=frac{-etaphi(x)^TS(Phi^Tmathbf{t}_N)}{eta^{-1}+phi(x)^TSphi(x)}end{aligned} ]

    最后我们考察常数项

    [eta^3 frac{(Phi^Tmathbf{t}_N)^TSphi(x)phi(x)^TS(Phi^Tmathbf{t}_N)}{1+etaphi(x)^TSphi(x)}=frac{eta^2(phi(x)^TS(Phi^Tmathbf{t}_N))^2}{eta^{-1}+phi(x)^TSphi(x)} ]

    综合以上,我们有

    [egin{aligned}exp(-frac{1}{2}(eta t^2+mu^T S^{-1}mu-m^TLambda^{-1}m))&=exp(-frac{1}{2(eta^{-1}+phi(x)^TSphi(x))}Big{t^2-2etaphi(x)^TS(Phi^Tmathbf{t}_N)t+eta^2(phi(x)^TS(Phi^Tmathbf{t}_N))^2Big})\&=exp(-frac{1}{2(eta^{-1}+phi(x)^TSphi(x))}(t-etaphi(x)^TS(Phi^Tmathbf{t}_N))^2)end{aligned} ]

    综合以上,我们可以得到

    [p(t|x,mathbf{x},mathbf{t})=frac{1}{sqrt{2picdot(eta^{-1}+phi(x)^TSphi(x))}}exp(-frac{1}{2(eta^{-1}+phi(x)^TSphi(x))}(t-etaphi(x)^TS(Phi^Tmathbf{t}_N))^2) ]

    [ m(x)=etaphi(x)^TS(Phi^Tmathbf{t}_N)=mu^Tphi(x)\ s^2(x)=eta^{-1}+phi(x)^TSphi(x)]

    以上两式对应PRML中的式1.70~1.71。式1.71中,第一项表示数据中的噪音(方差越小,数据越集中,不确定性越小);第二项表示关于参数(mathbf{w})的不确定性,当(N o infty)时,第二项趋于0,这是由于当数据量趋于无限大时,关于参数的不确定性逐渐消失,先验的影响逐渐减弱。理论上的证明如下,首先我们考察(S_{N+1}):

    [S_{N+1}=(alpha I+eta sum_{i=1}^N phi(x_i)phi(x_i)^T+eta phi(x_{N+1})phi(x_{N+1})^T)=(S_N^{-1}+eta phi(x_{N+1})phi(x_{N+1})^T)\=S_N-etafrac{S_Nphi(x_{N+1})phi(x_{N+1})^TS_N}{1+eta phi(x_{N+1})^TS_Nphi(x_{N+1})}\=S_N-frac{eta}{1+eta phi(x_{N+1})^TS_Nphi(x_{N+1})} (S_Nphi(x_{N+1}))(S_Nphi(x_{N+1}))^T ]

    于是

    [sigma_{N+1}^2(x)=eta^{-1}+phi(x)^TS_{N+1}phi(x)=sigma_N^2(x)-frac{eta}{1+eta phi(x_{N+1})^TS_Nphi(x_{N+1})}[ phi(x)^T(S_Nphi(x_{N+1}))]^2leq sigma_N^2(x) ]

    因此序列(sigma_N^2(x))是单调递减序列,又由于有下界(0),因此当(N oinfty)时,(sigma_N^2(x) o 0)

    于是我们知道

    [p(t|x,mathbf{x},mathbf{t})=mathcal{N}(t|m(x),s^2(x)) ]

    也就是说后验预测分布也是一个高斯,(t)的且均值、方差取决于(x)
    需要注意的是当(x)满足(eta=-(phi(x)^TSphi(x))^{-1})时,方差

    [s^2(x)=eta^{-1}+phi(x)^TSphi(x)=-phi(x)^TSphi(x)+phi(x)^TSphi(x)=0 ]

    因此在这一点分布未定义

  • 相关阅读:
    Thinkphp+Nginx(PHPstudy)下报的404错误,403错误解决
    浅谈 PHP 与手机 APP 开发(API 接口开发)
    B/S架构与C/S架构的区别
    动态查询:getBy字段名
    Cannot declare class apphomecontrollerCases because the name is already in use
    TP5与TP3.X对比
    SuperSpider——打造功能强大的爬虫利器
    阵列卡,组成的磁盘组就像是一个硬盘,pci-e扩展出sata3.0
    查看Linux系统下Raid信息
    网格计算, 云计算, 集群计算, 分布式计算, 超级计算
  • 原文地址:https://www.cnblogs.com/wacc/p/5495448.html
Copyright © 2011-2022 走看看