zoukankan html css js c++ java

LSA和pLSA的比较

Comparison

	LSA	pLSA
1. Theoretical background	Linear Algebra	Probabilities and Statistics
2. Objective function	Frobenius norm	Likelihood function
3. Polysemy	No	Yes
4. Folding-in	Straightforward	Complicated

1. LSA stems from Linear Algebra as it is nothing more than a Singular Value Decomposition. On the other hand, pLSA has a strong probabilistic grounding (latent variable models).

2. SVD is a least squares method (it finds a low-rank matrix approximation that minimizes the Frobenius norm of the difference with the original matrix). Moreover, as it is well known in Machine Learning, the least squares solution corresponds to the Maximum Likelihood solution when experimental errors are gaussian. Therefore, LSA makes an implicit assumption of gaussian noise on the term counts. On the other hand, the objective function maximized in pLSA is the likelihood function of multinomial sampling.

The values in the concept-term matrix found by LSA are not normalized and may even contain negative values. On the other hand, values found by pLSA are probabilities which means they are interpretable and can be combined with other models.

Note: SVD is equivalent to PCA (Principal Component Analysis) when the data is centered (has zero-mean).

3. Both LSA and pLSA can handle synonymy but LSA cannot handle polysemy, as words are defined by a unique point in a space.

4. LSA and pLSA analyze a corpus of documents in order to find a new low-dimensional representation of it. In order to be comparable, new documents that were not originally in the corpus must be projected in the lower-dimensional space too. This is called “folding-in”. Clearly, new documents folded-in don’t contribute to learning the factored representation so it is necessary to rebuild the model using all the documents from time to time.

In LSA, folding-in is as easy as a matrix-vector product. In pLSA, this requires several iterations of the EM algorithm.

查看全文

相关阅读:
Salesforce LWC学习(二十五) Jest Test
Salesforce LWC学习(二十四) Array.sort 浅谈
 Salesforce LWC学习(二十三) Lightning Message Service 浅谈
 Salesforce LWC学习(二十二) 简单知识总结篇二
 开发第一个基于PyQt5的桌面应用
 python操作sqlite的小例子
 git rebase -i的时候用的不是 vi 编辑器是 nano编辑器不会用
 周六和女友简单的分析了一下飞机大战游戏的设计思路和概念
 Debian/Ubuntu添加PPA源更新提示无公钥被禁用
 布鲁斯口琴进阶教程资料分享

原文地址：https://www.cnblogs.com/data2value/p/5435686.html