自然语言处理4-4：语言模型之模型评估perplexity - 走看看

zoukankan html css js c++ java

自然语言处理4-4：语言模型之模型评估perplexity
perplexity可以用来评估训练的语言模型的好坏，其实就是下面这个公式

$$ perplexity = 2^{-x}qquad x表示的是平均的log likelihood，也可以理解为平均的概率啦$$

下面举一个例子，假设我们有一个测试集如下：
我喜欢喝奶茶
假设我们有一个训练好的bi-gram模型，对于词典库中的每个单词，通过这个模型可以得到诸如P(单词2|单词1)的概率。这里列出部分概率
P（我）= 0.1 p(喜欢|我）= 0.1 p（喝|喜欢）=0.1 p（奶茶|喝）=0.1
我们可以得到

x = log{P（我）p(喜欢|我）p（喝|喜欢）p（奶茶|喝）} / 4= -4/4 = -1

perplexity = 2^(-x) = 2^(1) = 2

我们知道，似然估计越大越好，也就是说，x越大越好，所以perplexity越小越好，于是我们就可以通过perplexity对语言模型进行优化了。

这里举出的例子都非常小，实际上测试集肯定非常大，而且也不限于bi-gram，当然，训练的模型得到的概率也不可能都是0.1.
查看全文

相关阅读:
Java程序：从命令行接收多个数字，求和并输出结果
 大道至简读后感
 大道至简第一章读后感Java伪代码
 Creating a SharePoint BCS .NET Connectivity Assembly to Crawl RSS Data in Visual Studio 2010
声明式验证超时问题
 Error message when you try to modify or to delete an alternate access mapping in Windows SharePoint Services 3.0: "An update conflict has occurred, and you must re-try this action"
Upgrading or Redeploying SharePoint 2010 Workflows
Upgrade custom workflow in SharePoint
SharePoint 2013中Office Web Apps的一次排错
 How to upgrade workflow assembly in MOSS 2007

原文地址：https://www.cnblogs.com/loubin/p/13720192.html

Copyright © 2011-2022 走看看