词频统计+词云（傲慢与偏见） - 走看看

zoukankan html css js c++ java

词频统计+词云（傲慢与偏见）

#4.8.py
import jieba
excludes = {"先生","没有","太太","一个","自己","小姐","我们","可是","她们","他们","知道","事情","时候"}
txt = open("傲慢与偏见.txt", "r", encoding='utf-8').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
rword = word
counts[rword] = counts.get(rword,0) + 1
for word in excludes:
del(counts[word])
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(5):
word, count = items[i]
print ("{0:<10}{1:>5}".format(word, count))

#4.8.py
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud
txt = open("傲慢与偏见.txt", "r", encoding='utf-8').read()
excludes = {"先生","没有","太太","一个","自己","小姐","我们","可是","她们","他们","知道","事情","时候"}

words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
rword = word
counts[rword] = counts.get(rword,0) + 1
for word in excludes:
del(counts[word])
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(5):
word, count = items[i]

wc = WordCloud(font_path = r'.simhei.ttf',background_color = 'white',width = 500,height = 350,max_font_size=50,min_font_size=10)
wc.generate(txt)
wc.to_file("wordcloud.png")
plt.figure('wordcloud.png')
plt.imshow(wc)
plt.axis('off')
plt.show()

查看全文

相关阅读:
django 使用form组件提交数据之form表单提交
 django from验证组件
 django中间件
 gin中http重复解析body数据失败
 go 常用工具链
 git 提交规范
 go简单实现heap
Go优雅实现选传参数
 [已解决]protoc-gen-go: unable to determine Go import path for "xxx.proto"
Go编译工具命令

原文地址：https://www.cnblogs.com/Adaran/p/12659857.html

Copyright © 2011-2022 走看看