zoukankan      html  css  js  c++  java
  • LSA和pLSA的比较

    Comparison

     LSApLSA
    1. Theoretical background Linear Algebra Probabilities and Statistics
    2. Objective function Frobenius norm Likelihood function
    3. Polysemy No Yes
    4. Folding-in Straightforward Complicated

    1. LSA stems from Linear Algebra as it is nothing more than a Singular Value Decomposition. On the other hand, pLSA has a strong probabilistic grounding (latent variable models).

    2. SVD is a least squares method (it finds a low-rank matrix approximation that minimizes the Frobenius norm of the difference with the original matrix). Moreover, as it is well known in Machine Learning, the least squares solution corresponds to the Maximum Likelihood solution when experimental errors are gaussian. Therefore, LSA makes an implicit assumption of gaussian noise on the term counts. On the other hand, the objective function maximized in pLSA is the likelihood function of multinomial sampling.

    The values in the concept-term matrix found by LSA are not normalized and may even contain negative values. On the other hand, values found by pLSA are probabilities which means they are interpretable and can be combined with other models.

    Note: SVD is equivalent to PCA (Principal Component Analysis) when the data is centered (has zero-mean).

    3. Both LSA and pLSA can handle synonymy but LSA cannot handle polysemy, as words are defined by a unique point in a space.

    4. LSA and pLSA analyze a corpus of documents in order to find a new low-dimensional representation of it. In order to be comparable, new documents that were not originally in the corpus must be projected in the lower-dimensional space too. This is called “folding-in”. Clearly, new documents folded-in don’t contribute to learning the factored representation so it is necessary to rebuild the model using all the documents from time to time.

    In LSA, folding-in is as easy as a matrix-vector product. In pLSA, this requires several iterations of the EM algorithm.

  • 相关阅读:
    HDU 2836 Traversal 简单DP + 树状数组
    UVa 1402 Runtime Error 伸展树
    UVa 11922
    HDU 4358 Boring counting 树状数组+思路
    HDU 4351 Digital root 线段树区间合并
    LA 6187
    UPC 2224 / “浪潮杯”山东省第四届ACM大学生程序设计竞赛 1008 Boring Counting 主席树
    max 宏定义取消:error C2589: error C2059: 语法错误 : “::”
    QT+VTK 对接使用
    标准C++中的string类的用法总结
  • 原文地址:https://www.cnblogs.com/data2value/p/5435686.html
Copyright © 2011-2022 走看看