使用python可以轻松统计词频,做文章的词频统计也是轻而易举的事情。
1、添加自定义字典(如:超级赛亚人、奥里给等)
2、jieba分词
PS:直接将文章丢进 tf.txt 文件里,将自定义字典丢进 dict.txt 文件里就OK了
import jieba
txt = open("tf.txt", encoding="utf-8").read()
jieba.load_userdict("dict.txt")
words = jieba.lcut(txt)
counts = {}
for word in words:
counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(100):
word, count = items[i]
#print (word)
#print(count)
print ("{0:<10}{1:>5}".format(word, count))
print('
')
for i in range(100):
word, count = items[i]
#print(count/35323)
#print ("{0:<10}{1:>5}".format(word, count / 35323))
示例图: