python自然语言处理——1.3 计算语言：简单的统计

python自然语言处理——1.3 计算语言：简单的统计
微信公众号：数据运营人
本系列为博主的读书学习笔记，如需转载请注明出处。

第一章语言处理与python

1.3 计算语言：简单的统计频率分布细粒度的选择词词语搭配和双连词计算其他东西

1.3 计算语言：简单的统计

频率分布

统计频数：FreqDist()

fdist1 = FreqDist(text1) # 将列表转换为统计词频的字典 print(fdist1) vacabulary1 = list(fdist1.keys()) # 获取字典的key并转换为list print(vacabulary1[:50]) # 查看列表中前五十个字符串 print(fdist1['whale'])

返回结果：

# 绘制fdist1中50个常用词的累计频数分布图

fdist.plot(50,cumulative=True)

返回结果：

细粒度的选择词

V = set(text1) long_words = [w for w in V if len(w)>15] # 列表生成式 print(sorted(long_words)) fdist5 = FreqDist(text5) # 字典生成式 print(sorted({w for w in set(text5) if len(w)>7 and fdist5[w] > 7}))

返回结果：

词语搭配和双连词

词语搭配：bigrams()
寻找出现频率比预期频率更频繁的双连词：collocations()

# 返回的结果：<generator object bigrams at 0x123dca728>，可以将其转换为列表等 print(bigrams(['more','is','said','than','done'])) print(text4.collocations())

返回结果：

计算其他东西

```python
[len(w) for w in text1]
fdist = FreqDist([len(w) for w in text1])
print(fdist)
print(fdist.keys())
print(fdist.items())
print(fdist.max())
print(fdist[5])
print(fdist.freq(3))
···
返回结果：
查看全文

相关阅读:
How to Install Linux, Apache, MySQL, PHP (LAMP) stack on CentOS 6 【Reliable】
可以把一些常用的方法，写入js文件，引入html界面
 把功能写在方法里，函数化，方法化
 那些SQL语句
 Linux&shell之高级Shell脚本编程-创建菜单
 Linux&shell之高级Shell脚本编程-创建函数
 PHP isset()与empty()的使用区别详解
 如何打开mo文件并修改 PoEdit
Linux&shell之如何控制脚本
 Linux&shell之显示数据

原文地址：https://www.cnblogs.com/ly803744/p/10035385.html

python自然语言处理——1.3 计算语言：简单的统计

第一章 语言处理与python

1.3 计算语言：简单的统计

频率分布

细粒度的选择词

词语搭配和双连词

计算其他东西

第一章语言处理与python