zoukankan      html  css  js  c++  java
  • 文字统计

    中软国际华南区技术总监曾老师还会来上两次课,同学们希望曾老师讲些什么内容?(认真想一想回答)

      现在的大数据比较流行,希望能够学习更多能提高操作能力与大数据有关的知识

    中文分词

    下载一中文长篇小说,并转换成UTF-8编码。

    使用jieba库,进行中文词频统计,输出TOP20的词及出现次数。

    **排除一些无意义词、合并同一词。

    **使用wordcloud库绘制一个词云。

    (**两项选做,此次作业要求不能雷同。)

    file=open("test.txt","w")
    file.write('''President tells officials at conference to focus on safety, where the public has noted improvement
    
    President Xi Jinping has called for more systematic and innovative social governance, stressing the need to improve the capability to predict and prevent security risks.
    
    Xi, general secretary of the Communist Party of China Central Committee, was speaking on Tuesday at a Beijing conference to award individuals and units that have made outstanding contributions to the public security governance sector in the past five years.
    
    Since late 2012, people working in political and legal affairs have innovated social governance methods and dealt with large numbers of outstanding problems, making the general public feel safer, he said.
    
    The sense of security of the Chinese public has increased from 88 percent in 2012 to 92 percent in 2016, according to data released on Tuesday by the Central Committee for Comprehensive Management of Public Security, a central level authority in charge of social governance in China.
    
    However, Xi wants more effort to be taken to build a safer China. The authorities involved should be aware of the difficulties and challenges, carefully analyze the current situation of Chinese society, adopt technological innovation methods and improve the capacity to forecast and prevent risks, he said.
    
    Xi also stressed that social governance officers should have a better sense of the rule of law.
    
    More than 500 members from central and local political and law departments, as well as outstanding individuals and units, are participating in the two-day meeting, which concludes on Wednesday.
    
    "I'm excited and inspired after listening to the president's remarks, which made a comprehensive summary of social governance in the past five years and outlined the new requirements and tasks in the future," said Huang Ming, vice-minister of public security.
    
    Huang said the most valuable lesson he has learned is to go with the "innovative ideas and strategies to tackle the issues involving social security and win the support of the people".
    
    He Wenhao, a senior official in charge of political and legal affairs in the Tibet autonomous region who attended the conference, said the president's speech has built up his confidence in safeguarding Tibet.
    
    He said the biggest challenge he now faces is fighting "the separatist forces to ensure the continuous safety and stability in Tibet".''')
    file.close()
    file=open("test.txt","r")
    news=file.read()
    news=news.lower()
    for i in ',.?"':
        news=news.replace(i," ")
    words=news.split(" ")
    word = set(words)
    delwords={"","the","and","of","to","in","a","on"}
    word=word-delwords
    dic={}
    for i in word:
        dic[i]= words.count(i)
        words=list(dic.items())
        words.sort(key=lambda x:x[1],reverse=True)
        print(words) for i in range(11): word,count=words[i]
        print("{} {}".format(word,count))
    import jieba
    file=open("a.txt","r",encoding="GBK")
    file=file.read()
    words=list(jieba.cut(file))
    delword={"
    ","u3000","(",")",""," ","",""}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(11):
        word,count=items[i]
        print("{} {}".format(word,count))
    file.close()

     

    import jieba
    file=open("a.txt","r",encoding="GBK")
    file=file.read()
    words=list(jieba.cut(file))
    delword={"
    ","u3000","(",")",""," ","",""}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(20):
        word,count=items[i]
        print("{} {}".format(word,count))
    file.close()
    import jieba
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    
    txt = open("aaa.txt","r",encoding='utf-8').read()
    
    delword = {" ","","","","
    ","u3000"}
    
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(20):
        word,count=items[i]
        print("{} {}".format(word,count))
    wzhz = WordCloud().generate(txt)
    plt.imshow(wzhz)
    plt.show()
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    import jieba
    file=open("a.txt","r",encoding="GBK")
    files=file.read()
    words=list(jieba.cut(files))
    delword={"
    ","u3000","(",")",""," ","","","曰","乎","也","耶","者"}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    for i in range(8):
        word,count=items[i]
        print("{} {}".format(word,count))
    put = WordCloud().generate(files)
    plt.imshow(put)
    plt.show()

  • 相关阅读:
    用nodejs的express框架在本机快速搭建一台服务器
    Python版求数组的最大连续区间
    简洁的python测试框架——Croner
    中国有嘻哈——押韵机器人
    服务端测试环境hosts配置检查脚本
    手机客户端软件测试用例设计模板
    【Tomcat】压力测试和优化
    【Tomcat】详解tomcat的连接数与线程池
    【RabbitMQ】2、心得总结,资料汇总
    【高并发解决方案】7、HAProxy安装和配置
  • 原文地址:https://www.cnblogs.com/murasame/p/7590313.html
Copyright © 2011-2022 走看看