zoukankan      html  css  js  c++  java
  • 词频统计及词云绘制

    首先,在电脑上安装jieba库,进入命令提示符,输入pip install jieba,接下来就等系统自动安装

    然后再进入IDLE建立一个脚本,用open函数打开只读模式,用jieba.lcut函数剪下词组,对每一个剪下的词组进行统计,最后输出。

     1 import jieba
     2 txt = open(r"F:清道夫.txt", "r", encoding='utf-8').read()
     3 words  = jieba.lcut(txt)
     4 counts = {}
     5 for word in words:
     6     if len(word) == 1:  #排除单个字符的分词结果
     7         continue
     8     else:
     9         counts[word] = counts.get(word,0) + 1
    10 items = list(counts.items())
    11 items.sort(key=lambda x:x[1], reverse=True) 
    12 for i in range(10):
    13     word, count = items[i]
    14     print ("{0:<10}{1:>5}".format(word, count))

    运行结果如下

     然后来绘制词云,需要先安装wordcloud库,还是用上面的方法,pip install wordcloud,安装好之后,执行如下代码

    from wordcloud import WordCloud
    with open("F:清道夫.txt",encoding="utf-8")as file:
        text=file.read()
        wordcloud=WordCloud(
            font_path="C:/Windows/Fonts/simfang.ttf",
            background_color="white",
            width=600,
            height=300,max_words=50).generate(text)
        image=wordcloud.to_image()
        image.show()

    效果如下

  • 相关阅读:
    [HNOI2008]玩具装箱TOY
    UVA1185 Big Number
    01分数规划
    [HNOI2010]弹飞绵羊
    Mobius反演的套路
    MySQL日志
    MySQL事务、锁机制、查询缓存
    MySQL的索引
    MySQL的存储引擎
    HAProxy学习笔记
  • 原文地址:https://www.cnblogs.com/lzz807237221/p/10652584.html
Copyright © 2011-2022 走看看