lda：变分的推导 - 走看看

zoukankan html css js c++ java

lda：变分的推导

lda，latent diriclet allocation,是一个最基本的bayesian模型。本文要研究lda基于变分的推导方法。意义是重大的。

一、符号的定义

: the number of topics
: the number of documents
: the number of terms in vocabulary
: index topic
: index document
: index word
: denote a word

in LDA:
: model parameter
: model parameter
,: hidden variables.

图模型：
引入variational parameter:
: Dirichlet parameter
: Multinomial parameter

我们引入variational distribution，a fully factorized model

要注意的是，是后验分布，我们隐去了given

二、总论

我们使用了variational EM algorithm：
在E step，我们使用variational approximation to posterior来最优化variational parameters，找到最靠谱的后验分布。
在M step，我们提升lower bound with respect to the model parameters。

具体算法：
E-step: 对于每一个文档，find optimal values of the variational parameters

M-step：maximize the lower bound with respect to the model parameters and

三、lower bound

3.1 Jensens inequality

有随机变量，对于convex的，有 ;
对于concave的，有;

3.2 推导lower bound

for each document each word

查看全文

相关阅读:
virtual Box在Centos 7上的安装
 Spark MLlib使用有感
 storm集群配置
 eclipse配置hadoop插件
 HDFS的java接口——简化HDFS文件系统操作
 【hbase】——HBase 写优化之 BulkLoad 实现数据快速入库
 【hbase】——Java操作Hbase进行建表、删表以及对数据进行增删改查，条件查询
 【转】RHadoop实践系列之二：RHadoop安装与使用
 【转】RHadoop实践系列之一:Hadoop环境搭建
 Hadoop生态系统如何选择搭建

原文地址：https://www.cnblogs.com/zjgtan/p/3952994.html

Copyright © 2011-2022 走看看