zoukankan      html  css  js  c++  java
  • 文字统计

    中软国际华南区技术总监曾老师还会来上两次课,同学们希望曾老师讲些什么内容?(认真想一想回答)

      现在的大数据比较流行,希望能够学习更多能提高操作能力与大数据有关的知识

    中文分词

    下载一中文长篇小说,并转换成UTF-8编码。

    使用jieba库,进行中文词频统计,输出TOP20的词及出现次数。

    **排除一些无意义词、合并同一词。

    **使用wordcloud库绘制一个词云。

    (**两项选做,此次作业要求不能雷同。)

    file=open("test.txt","w")
    file.write('''President tells officials at conference to focus on safety, where the public has noted improvement
    
    President Xi Jinping has called for more systematic and innovative social governance, stressing the need to improve the capability to predict and prevent security risks.
    
    Xi, general secretary of the Communist Party of China Central Committee, was speaking on Tuesday at a Beijing conference to award individuals and units that have made outstanding contributions to the public security governance sector in the past five years.
    
    Since late 2012, people working in political and legal affairs have innovated social governance methods and dealt with large numbers of outstanding problems, making the general public feel safer, he said.
    
    The sense of security of the Chinese public has increased from 88 percent in 2012 to 92 percent in 2016, according to data released on Tuesday by the Central Committee for Comprehensive Management of Public Security, a central level authority in charge of social governance in China.
    
    However, Xi wants more effort to be taken to build a safer China. The authorities involved should be aware of the difficulties and challenges, carefully analyze the current situation of Chinese society, adopt technological innovation methods and improve the capacity to forecast and prevent risks, he said.
    
    Xi also stressed that social governance officers should have a better sense of the rule of law.
    
    More than 500 members from central and local political and law departments, as well as outstanding individuals and units, are participating in the two-day meeting, which concludes on Wednesday.
    
    "I'm excited and inspired after listening to the president's remarks, which made a comprehensive summary of social governance in the past five years and outlined the new requirements and tasks in the future," said Huang Ming, vice-minister of public security.
    
    Huang said the most valuable lesson he has learned is to go with the "innovative ideas and strategies to tackle the issues involving social security and win the support of the people".
    
    He Wenhao, a senior official in charge of political and legal affairs in the Tibet autonomous region who attended the conference, said the president's speech has built up his confidence in safeguarding Tibet.
    
    He said the biggest challenge he now faces is fighting "the separatist forces to ensure the continuous safety and stability in Tibet".''')
    file.close()
    file=open("test.txt","r")
    news=file.read()
    news=news.lower()
    for i in ',.?"':
        news=news.replace(i," ")
    words=news.split(" ")
    word = set(words)
    delwords={"","the","and","of","to","in","a","on"}
    word=word-delwords
    dic={}
    for i in word:
        dic[i]= words.count(i)
        words=list(dic.items())
        words.sort(key=lambda x:x[1],reverse=True)
        print(words) for i in range(11): word,count=words[i]
        print("{} {}".format(word,count))
    import jieba
    file=open("a.txt","r",encoding="GBK")
    file=file.read()
    words=list(jieba.cut(file))
    delword={"
    ","u3000","(",")",""," ","",""}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(11):
        word,count=items[i]
        print("{} {}".format(word,count))
    file.close()

     

    import jieba
    file=open("a.txt","r",encoding="GBK")
    file=file.read()
    words=list(jieba.cut(file))
    delword={"
    ","u3000","(",")",""," ","",""}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(20):
        word,count=items[i]
        print("{} {}".format(word,count))
    file.close()
    import jieba
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    
    txt = open("aaa.txt","r",encoding='utf-8').read()
    
    delword = {" ","","","","
    ","u3000"}
    
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(20):
        word,count=items[i]
        print("{} {}".format(word,count))
    wzhz = WordCloud().generate(txt)
    plt.imshow(wzhz)
    plt.show()
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    import jieba
    file=open("a.txt","r",encoding="GBK")
    files=file.read()
    words=list(jieba.cut(files))
    delword={"
    ","u3000","(",")",""," ","","","曰","乎","也","耶","者"}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    for i in range(8):
        word,count=items[i]
        print("{} {}".format(word,count))
    put = WordCloud().generate(files)
    plt.imshow(put)
    plt.show()

  • 相关阅读:
    Linux、Windows网络工程师面试题精选
    (转)JVM 垃圾回收算法
    笔试题学习
    使用Spring的好处
    JAVA保留字与关键字
    经典算法问题的java实现
    详解平均查找长度
    13种排序算法详解
    Grunt
    sublimeText
  • 原文地址:https://www.cnblogs.com/murasame/p/7590313.html
Copyright © 2011-2022 走看看