zoukankan      html  css  js  c++  java
  • 【python】利用jieba中文分词进行词频统计及生成词云

    以下代码对鲁迅的《祝福》进行了词频统计:

     1 import io
     2 import jieba
     3 txt = io.open("zhufu.txt", "r", encoding='utf-8').read()
     4 words  = jieba.lcut(txt)
     5 counts = {}
     6 for word in words:
     7     if len(word) == 1:
     8         continue
     9     else:
    10         counts[word] = counts.get(word,0) + 1
    11 items = list(counts.items())
    12 items.sort(key=lambda x:x[1], reverse=True) 
    13 for i in range(15):
    14     word, count = items[i]
    15     print (u"{0:<10}{1:>5}".format(word, count))

    结果如下:

    并把它生成词云

     1 from wordcloud import WordCloud
     2 import PIL.Image as image
     3 import numpy as np
     4 import jieba
     5  
     6 # 分词
     7 def trans_CN(text):
     8     # 接收分词的字符串
     9     word_list = jieba.cut(text)
    10     # 分词后在单独个体之间加上空格
    11     result = " ".join(word_list)
    12     return result
    13  
    14 with open("zhufu.txt") as fp:
    15     text = fp.read()
    16     # print(text)
    17     # 将读取的中文文档进行分词
    18     text = trans_CN(text)
    19     mask = np.array(image.open("xinxing.jpg"))
    20     wordcloud = WordCloud(
    21         # 添加遮罩层
    22         mask=mask,
    23         font_path = "msyh.ttc"
    24     ).generate(text)
    25     image_produce = wordcloud.to_image()
    26     image_produce.show()

    效果如下:

  • 相关阅读:
    python- 冒泡算法
    python-文件读写资料整理
    Python学习之edx6001-week4
    Python学习之自动化开发-DAY1作业-三级菜单
    python学习之自动化开发-DAY1作业-登陆程序
    python 学习之edx6001-week4
    elasticsearch RESTfull _cat api
    python
    python
    python 文件操作
  • 原文地址:https://www.cnblogs.com/litchi666/p/12703836.html
Copyright © 2011-2022 走看看