EM
Struct
loop : (precision condition)
- zero_inilialize_ss ( ss, model) :set every item of class_word and * class_total* to 0
- e_step:
# 以document为单位建模
for i in corpus->num_docs:
#每次把class_word置0,迭代,其实最后为了更新phi和gamma
doc_e_step( corpus->doc[i],var_gamma[d],phi,model,ss);
-
m_step
-- lda_mle() : model->log_prob_w[k][w] = log( ss->class_word[k][w] / ss->class_total[k] ) -
update precision condition
about convergence
doc_e_step ( Every document )
- likehood = lda_inference()
- update alpha_ss
ss->alpha_ss = sum[1~NTOPICS]digamma(gamma[i]) - NTOPICS * digamma ( sum[1~NTOPICS]gamma[i] ) - set value of class_word and class_total.
ss->class_word[topic][doc-word[n]] += doc -> count[n] * phi [n][k]
ss->class_total = sum[K:1~NTOPICS] phi[n][k] #phi[][] : word ~ topic
这里的phi[n][k]是以doc为单位的,所以没有word[k]和k值不同,total是每个word下topic分布和。