zoukankan      html  css  js  c++  java
  • 词云的使用

    使用wordcloud模块进行生成词云。一般可以生成两种类型的词云:

    一、默认图片生成

    import warnings
    warnings.filterwarnings("ignore")
    import jieba    #分词包
    import numpy    #numpy计算包
    import codecs   #codecs提供的open方法来指定打开的文件的语言编码,它会在读取的时候自动转换为内部unicode 
    import pandas as pd  
    import matplotlib.pyplot as plt
    %matplotlib inline
    import matplotlib
    matplotlib.rcParams['figure.figsize'] = (10.0, 5.0)
    from wordcloud import WordCloud#词云包
    
    #导入娱乐新闻数据,分词:
    df = pd.read_csv("./data/entertainment_news.csv", encoding='utf-8')
    df = df.dropna()
    content=df.content.values.tolist()
    #jieba.load_userdict(u"data/user_dic.txt")
    segment=[]
    for line in content:
        try:
            segs=jieba.lcut(line)
            for seg in segs:
                if len(seg)>1 and seg!='
    ':
                    segment.append(seg)
        except:
            print line
            continue
    
    #去停用词
    words_df=pd.DataFrame({'segment':segment})
    #words_df.head()
    stopwords=pd.read_csv("data/stopwords.txt",index_col=False,
            quoting=3,sep=" ",names=['stopword'], encoding='utf-8')#quoting=3全不引用 #stopwords.head() words_df=words_df[~words_df.segment.isin(stopwords.stopword)] #统计词频 words_stat=words_df.groupby(by=['segment'])['segment'].agg({"计数":numpy.size}) words_stat=words_stat.reset_index().sort_values(by=["计数"],ascending=False) words_stat.head() #做词云 wordcloud=WordCloud(font_path="data/simhei.ttf",background_color="white",
        max_font_size=80) word_frequence = {x[0]:x[1] for x in words_stat.head(1000).values} wordcloud=wordcloud.fit_words(word_frequence) plt.imshow(wordcloud)

    效果:

    二、生成自定义图片的词云

    from scipy.misc import imread
    matplotlib.rcParams['figure.figsize'] = (15.0, 15.0)
    from wordcloud import WordCloud,ImageColorGenerator
    bimg=imread('image/entertainment.jpeg')
    wordcloud=WordCloud(background_color="white",mask=bimg,font_path='data/simhei.ttf',max_font_size=200)
    word_frequence = {x[0]:x[1] for x in words_stat.head(1000).values}
    wordcloud=wordcloud.fit_words(word_frequence)
    bimgColors=ImageColorGenerator(bimg)
    plt.axis("off")
    plt.imshow(wordcloud.recolor(color_func=bimgColors))

    效果:

     

  • 相关阅读:
    基础普及-Jar、War、Ear
    Guice 学习(五)多接口的实现( Many Interface Implementation)
    Foundation框架
    windowsclient开发--使用、屏蔽一些快捷键
    数据结构(Java语言)——BinaryHeap简单实现
    最小生成树之Prim(普里姆)算法
    LeetCode--Remove Element
    Java实现算法之--选择排序
    kvm云主机使用宿主机usb设备
    Oracle12c Client安装出现"[INS-30131]"错误“请确保当前用户具有访问临时位置所需的权限”解决办法之完整版
  • 原文地址:https://www.cnblogs.com/ctsch/p/8590551.html
Copyright © 2011-2022 走看看