布朗语料库中条件概率分布函数ConditionalFreqDist使用 - 走看看

zoukankan html css js c++ java

布朗语料库中条件概率分布函数ConditionalFreqDist使用

布朗语料库中使用条件概率分布函数ConditionalFreqDist，可以查看每个单词在各新闻语料中出现的次数。这在微博情感分析中非常有用，比如判断feature vector中代表positive or negative or neutral的各feature在每条tweet中出现的次数高低来判断该tweet的情感极性。

from nltk.corpus import brown

cfd=nltk.ConditionalFreqDist(
(genre,word)
for genre in brown.categories()
for word in brown.words(categories=genre)
)
genres=['news','religion','hobbies','science_fiction','romance','humor']
modals=['can','could','may','might','must','will']
print cfd.tabulate(conditions=genres,samples=modals)

输出结果：

can could may might must will
news 93 86 66 38 50 389
religion 82 59 78 12 54 71
hobbies 268 58 131 22 83 264
science_fiction 16 49 4 12 8 16
romance 74 193 11 51 45 43
humor 16 30 8 8 9 13
可以看出news分类中will一词出现最多，humor分类中could出现次数最多。

查看全文

相关阅读:
AndroidStudio制作个人资料界面模块以及SQLite数据库的使用
 掌握这13个MySQL索引知识点，让你面试通过率翻倍
 获取数据表最后最后访问，修改，更新，扫描时间
 一本彻底搞懂MySQL索引优化EXPLAIN百科全书
 Win10系统下的MySQL5.7.24版本（解压版）详细安装教程
 解决beego在ubuntu下连接mysql与重置mysql密码
 在Windows上安装MySQL
docker~dockertoolbox的加速器
 Git 安装 on centos7
centos7.x中安装SQL Server

原文地址：https://www.cnblogs.com/finesite/p/3350582.html

Copyright © 2011-2022 走看看