![](https://images0.cnblogs.com/blog2015/610239/201504/171613222606904.png)
![](https://images0.cnblogs.com/blog2015/610239/201504/171620465101917.png)
$L=E_q(logp( heta,z,w|alpha,eta))-E_q(logq( heta,z|gamma,phi))\ \, =E_q(logp( heta|alpha)p(z| heta)p(w|z,eta))-E_q(logq( heta|gamma)q(z|phi))\ \,=E_q(logp( heta|alpha))_{[1]}+E_q(logp(z| heta))_{[2]}+E_q(logp(w|z,eta))_{[3]}-E_q(logq( heta|gamma))_{[4]}-E_q(logq(z|phi))_{[5]}$
$[1]=int q( heta|gamma)log [frac{Gamma(sum_i alpha_i)}{prod_iGamma(alpha_i)}prod_i heta_i^{alpha_i-1}]d heta\ =int q( heta|gamma)(logGamma(sum_i alpha_i)-sum_i logGamma(alpha_i)+sum_i (alpha_i-1)log heta_i)d heta\ =logGamma(sum_i alpha_i)-sum_i logGamma(alpha_i)+sum_i (alpha_i-1)(Psi(gamma_i)-Psi(sum_j gamma_j))$
$[2]=int q( heta|gamma)sum_{n=1}^N sum_i phi_{ni}log heta_i) d heta \ = sum_{n=1}^N sum_i phi_{ni}(Psi(gamma_i)-Psi(sum_j gamma_j))$
$[3]=sum_{n=1}^N sum_i sum_{j=1}^V phi_{ni}w_n^jlogeta_{ij}$ $w_n^j$表示$w_n$的第j个分量
$[4]=int q( hetagamma)log [frac{Gamma(sum_i gamma_i)}{prod_iGamma(gamma_i)}prod_i heta_i^{gamma_i-1}]d heta\ =logGamma(sum_i gamma_i)-sum_i logGamma(gamma_i)+sum_i (gamma_i-1)(Psi(gamma_i)-Psi(sum_j gamma_i))$
$[5]=sum_{n=1}^N sum_i phi_{ni} log phi_{ni}$
E-step(固定$alpha$,$eta$,对每份文档优化$gamma$,$phi$,以最大化L)
1.优化$phi$
L中和$phi$相关项:
$L_{[phi]}=sum_{n=1}^N sum_i phi_{ni}(Psi(gamma_i)-Psi(sum_j gamma_j))+sum_{n=1}^N sum_i sum_{j=1}^V phi_{ni}w_n^jlogeta_{ij}+sum_{n=1}^N sum_i phi_{ni} log phi_{ni}$
限制条件为:$sum_i phi_{ni}=1$,使用拉格朗日乘数法,加$lambda(sum_i phi_{ni}-1)$,对$phi_ni$求导得到:
$frac{partial L}{partial phi_{ni}}=Psi(gamma_i)-Psi(sum_j gamma_j)+log(eta_{iv}-logphi_{ni})-1+lambda$ $beta_{iv}$表示第i个主题下单词$w_n$出现的概率
令上式为0
$phi_{ni}=eta_{iv}exp(Psi(gamma_i)-Psi(sum_j gamma_j))exp(lambda-1)$
注意:在实际代码中,在更新完$phi_{ni}$后,需要进行正规化(使相加为1),所以后面的公共项$exp(-Psi(sum_j gamma_j))exp(lambda-1)$不用计算
2.优化$gamma$
L中和$gamma$有关的项:
$L_{[gamma]}=sum_i (alpha_i-1)(Psi(gamma_i)-Psi(sum_j gamma_j))+sum_{n=1}^N sum_i phi_{ni}(Psi(gamma_i)-Psi(sum_j gamma_j))-logGamma(sum_i gamma_i)+sum_i logGamma(gamma_i)-sum_i (gamma_i-1)(Psi(gamma_i)-Psi(sum_j gamma_i))\ =sum_i (Psi(gamma_i)-Psi(sum_j gamma_j))(alpha_i+sum_n phi_{ni}-gamma_i)-logGamma(sum_i gamma_i)+sum_i logGamma(gamma_i)$
对$gamma_i$求导得到:
$frac{partial L}{partial gamma_i}=Psi '(gamma_i)(alpha_i+sum_n phi_{ni}-gamma_i)-Psi '(sum_j gamma_j)sum_j(alpha_j+sum_n phi_{nj}-gamma_j)$
令上式为0
$gamma_i=alpha_i+sum_n phi_{nj}$
代码实现中需要初始化$gamma_i=alpha_i+N/k$($Dir(gamma)$为后验分布,在给定每个单词的主题后,可以计算得到$gamma$的值)
M-step(固定每份文档的$gamma$,$phi$,优化$alpha$,$eta$,以最大化$sum_{d=1}^ML$,提高似然函数的下界)
1.优化$eta$
L中和$ beta $相关的项,加上拉格朗日乘子式($sum_{j=1}^V eta_{ij}-1=0$):
$sum_{d=1}^ML_{[ beta]}=sum_{d=1}^Msum_{n=1}^{N_d}sum_{i=1}^ksum_{j=1}^V phi_{dni}w_{dn}^jlogeta_{ij}+sum_{i=1}^klambda_i(sum_{j=1}^V eta_{ij}-1)$
对$eta_{ij}$求导:
$frac{partial L}{partial eta_{ij}}=sum_{d=1}^Msum_{n=1}^{N_d}phi_{dni}w_{dn}^j/eta_{ij}+lambda_i$
令上式为0
$eta_{ij}=-frac{1}{lambda_i}sum_{d=1}^Msum_{n=1}^{N_d}phi_{dni}w_{dn}^j$
注意:在实际代码中,在更新完$eta_{ij}$后,需要进行正规化(使相加为1),所以公共项$-frac{1}{lambda_i}$不用计算
2.优化$alpha$
L中和$alpha$相关的项:
$sum_{d=1}^ML_{[alpha]}=sum_{d=1}^M(logPsi(sum_{i=1}^k alpha_j)-sum_{i=1}^k logPsi(alpha_i)+sum_{i=1}^k(alpha_i-1)(Psi(gamma_{di})-Psi(sum_{j=1}^k gamma_{dj})))$
对$alpha_i$求导:
$frac{partial L}{partial alpha_i}=M(Psi(sum_{j=1}^k alpha_j)-Psi(alpha_i))+sum_{d=1}^M(Psi(gamma_{di}-Psi(sum_{j=1}^k gamma_{dj})))$
由于上式和$alpha_j$相关,考虑使用newton法求解,迭代公式如下:
$alpha_{t+1}=alpha_k+H(alpha_k)^{-1}g(alpha_k)$ 这里求最大值,所以有$+$号,如果求最小值,要用$-$号
此处$H$为Hessian矩阵,$g$为梯度向量(即$frac{partial L}{partial alpha_i}$)
$H_{ij}=frac{partial L}{partial alpha_ialpha_j}=delta(i,j)MPsi '(alpha_i)-MPsi '(sum_{j=1}^k alpha_j)$ $delta(i,j)=1quad if i=j$
hession矩阵具有以下形式:
$H=diag(h)+1\,Z\,1^T$
其中$h_{i}=MPsi '(alpha_i)$,$Z=-MPsi '(sum_{j=1}^k alpha_j)$
所以在计算$H^{-1}g$时,可以用以下公式计算:
$(H^{-1}g)_i=frac{g_i-c}{h_i}$ $c=frac{sum_{j=1}^k g_j/h_j}{frac{1}{Z}+sum_{j=1}^k frac{1}{h_j}}$