zoukankan      html  css  js  c++  java
  • Jieba库 和 词云

    运用jieba库统计词频,并对词频进行排序:

    import jieba
    txt = open("文章.txt","r",encoding='gbk',errors='replace').read()
    words  = jieba.lcut(txt)
    counts = {}
    for word in words:
        if len(word) == 1:
            continue
        else:
            counts[word] = counts.get(word,0) + 1
           
    items = list(counts.items())
    items.sort(key=lambda x:x[1], reverse=True)
    for i in range(15):
        word, count = items[i]
        print ("{0:<10}{1:>5}".format(word, count))

    词云:

    from wordcloud import WordCloud
    import matplotlib.pyplot as plt
    import jieba
    def create_word_cloud(filename):
        text = open(file='文章.txt', encoding='utf-8').read()
        wordlist = jieba.cut(text, cut_all=True)
        wl = " ".join(wordlist)
        wc = WordCloud(
            background_color="black",
            max_words=2000,
            font_path='msyhl.ttf',
            height=1200,
            width=1600,
            max_font_size=100,
            random_state=100,
            )
        myword = wc.generate(wl) 
        plt.imshow(myword)
        plt.axis("off")
        plt.show()
        wc.to_file('img_book.png')
    if __name__ == '__main__':
        create_word_cloud('mytext')

  • 相关阅读:
    团购网站之大众点评
    cas xml
    smsUtil
    solr配置
    xml
    yu
    Schema.xml
    ApplicationContext-redis.xml
    fast
    第一版
  • 原文地址:https://www.cnblogs.com/wjxk/p/12743882.html
Copyright © 2011-2022 走看看