zoukankan      html  css  js  c++  java
  • 文字统计

    中软国际华南区技术总监曾老师还会来上两次课,同学们希望曾老师讲些什么内容?(认真想一想回答)

      现在的大数据比较流行,希望能够学习更多能提高操作能力与大数据有关的知识

    中文分词

    下载一中文长篇小说,并转换成UTF-8编码。

    使用jieba库,进行中文词频统计,输出TOP20的词及出现次数。

    **排除一些无意义词、合并同一词。

    **使用wordcloud库绘制一个词云。

    (**两项选做,此次作业要求不能雷同。)

    file=open("test.txt","w")
    file.write('''President tells officials at conference to focus on safety, where the public has noted improvement
    
    President Xi Jinping has called for more systematic and innovative social governance, stressing the need to improve the capability to predict and prevent security risks.
    
    Xi, general secretary of the Communist Party of China Central Committee, was speaking on Tuesday at a Beijing conference to award individuals and units that have made outstanding contributions to the public security governance sector in the past five years.
    
    Since late 2012, people working in political and legal affairs have innovated social governance methods and dealt with large numbers of outstanding problems, making the general public feel safer, he said.
    
    The sense of security of the Chinese public has increased from 88 percent in 2012 to 92 percent in 2016, according to data released on Tuesday by the Central Committee for Comprehensive Management of Public Security, a central level authority in charge of social governance in China.
    
    However, Xi wants more effort to be taken to build a safer China. The authorities involved should be aware of the difficulties and challenges, carefully analyze the current situation of Chinese society, adopt technological innovation methods and improve the capacity to forecast and prevent risks, he said.
    
    Xi also stressed that social governance officers should have a better sense of the rule of law.
    
    More than 500 members from central and local political and law departments, as well as outstanding individuals and units, are participating in the two-day meeting, which concludes on Wednesday.
    
    "I'm excited and inspired after listening to the president's remarks, which made a comprehensive summary of social governance in the past five years and outlined the new requirements and tasks in the future," said Huang Ming, vice-minister of public security.
    
    Huang said the most valuable lesson he has learned is to go with the "innovative ideas and strategies to tackle the issues involving social security and win the support of the people".
    
    He Wenhao, a senior official in charge of political and legal affairs in the Tibet autonomous region who attended the conference, said the president's speech has built up his confidence in safeguarding Tibet.
    
    He said the biggest challenge he now faces is fighting "the separatist forces to ensure the continuous safety and stability in Tibet".''')
    file.close()
    file=open("test.txt","r")
    news=file.read()
    news=news.lower()
    for i in ',.?"':
        news=news.replace(i," ")
    words=news.split(" ")
    word = set(words)
    delwords={"","the","and","of","to","in","a","on"}
    word=word-delwords
    dic={}
    for i in word:
        dic[i]= words.count(i)
        words=list(dic.items())
        words.sort(key=lambda x:x[1],reverse=True)
        print(words) for i in range(11): word,count=words[i]
        print("{} {}".format(word,count))
    import jieba
    file=open("a.txt","r",encoding="GBK")
    file=file.read()
    words=list(jieba.cut(file))
    delword={"
    ","u3000","(",")",""," ","",""}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(11):
        word,count=items[i]
        print("{} {}".format(word,count))
    file.close()

     

    import jieba
    file=open("a.txt","r",encoding="GBK")
    file=file.read()
    words=list(jieba.cut(file))
    delword={"
    ","u3000","(",")",""," ","",""}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(20):
        word,count=items[i]
        print("{} {}".format(word,count))
    file.close()
    import jieba
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    
    txt = open("aaa.txt","r",encoding='utf-8').read()
    
    delword = {" ","","","","
    ","u3000"}
    
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(words)
    for i in range(20):
        word,count=items[i]
        print("{} {}".format(word,count))
    wzhz = WordCloud().generate(txt)
    plt.imshow(wzhz)
    plt.show()
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    import jieba
    file=open("a.txt","r",encoding="GBK")
    files=file.read()
    words=list(jieba.cut(files))
    delword={"
    ","u3000","(",")",""," ","","","曰","乎","也","耶","者"}
    keys=set(words)-delword
    dic={}
    for i in keys:
        dic[i]=words.count(i)
    items=list(dic.items())
    items.sort(key=lambda x:x[1],reverse=True)
    for i in range(8):
        word,count=items[i]
        print("{} {}".format(word,count))
    put = WordCloud().generate(files)
    plt.imshow(put)
    plt.show()

  • 相关阅读:
    Windows 编程入门,了解什么是UWP应用。
    java getway springcloud 记录请求数据
    nginx服务器配置传递给下一层的信息的一些参数-设置哪些跨域的域名可访问
    e.printStackTrace() 原理的分析
    关于性能测试组出现的问题查询和优化
    springboot connecting to :mongodb://127.0..0.1:27017/test authentication failed
    redis 集群 slots are covered by nodes.
    @PostConstruct +getapplicationcontext.getbean springboot获取getBean
    idea 错误: 找不到或无法加载主类 xx.xxx.Application
    elastic-job和spring cloud版本冲突2
  • 原文地址:https://www.cnblogs.com/murasame/p/7590313.html
Copyright © 2011-2022 走看看