zoukankan      html  css  js  c++  java
  • 文件方式实现完整的英文词频统计实例

    1.读入待分析的字符串

    2.分解提取单词 

    3.计数字典

    4.排除语法型词汇

    5.排序

    6.输出TOP(20)

    crity.txt文件中的内容
    Donald Trump was the subject of startlingly strong abuse from major sports stars on Saturday, after he criticised NFL players protesting against racial injustice and withdrew an invitation for the NBA-champion Golden State Warriors to visit the White House, breaking a tradition dating back to the Reagan years.

    The Cleveland Cavaliers star forward LeBron James called the president a “bum” while the Buffalo Bills running back LeSean McCoy went further, calling Trump an “asshole”. Even NFL commissioner Roger Goodell saying in a statement: “Divisive comments like Trump’s demonstrate an unfortunate lack of respect for the NFL.”

    In their own statement, the Warriors accepted they would not be going to the White House. But they said they would use their trip to Washington in February – they play the Washington Wizards on the 28th of that month – to “celebrate equality, diversity and inclusion”.

    On Friday, point guard Steph Curry, the NBA champions’ star player, told reporters he planned to vote no when the players came together to decide whether to visit Trump. The Warriors could “inspire some change” and “send a statement” by snubbing the president, Curry said.

    On Saturday morning, Trump tweeted: “Going to the White House is considered a great honor for a championship team, Stephen Curry is hesitating, therefore invitation is withdrawn!”

    In a statement issued later, the Warriors said: “While we intended to meet as a team at the first opportunity we had this morning to collaboratively discuss a potential visit to the White House, we accept that President Trump has made it clear that we are not invited.

    “We believe there is nothing more American than our citizens having the right to express themselves freely on matters important to them. We’re disappointed that we did not have an opportunity during this process to share our views or have open dialogue on issues we felt would be important to raise.

    “In lieu of a visit to the White House, we have decided that we’ll constructively use our trip to the nation’s capital in February to celebrate equality, diversity and inclusion – the values that we embrace as an organization.

     1 # _*_coding:utf-8_*_
     2 # 实例:词频统计
     3 # 打开文件
     4 fr = open('crity.txt','r',encoding= 'utf-8')
     5 str = fr.read()
     6 fr.close()
     7 
     8 # 排除元数的集合
     9 exc = {'','a','an','the','in','on','that','and','to','is','of','for','have','not','nfl','said','would'}
    10 # 将,?.!变成空格
    11 x = {',','.','
    ','  ','?',':','!','','',''}
    12 for i in x:
    13     str = str.replace(i, ' ')
    14 # 将所有大写转换为小写
    15 str = str.lower()
    16 # 把歌词切片
    17 words = str.split(' ')
    18 # 定义一个空字典
    19 di = {}
    20 # 单词排序
    21 words.sort()
    22 # 用循环,写入字典
    23 disc = set(words)
    24 disc = disc - exc
    25 for i in disc:
    26     di[i] = words.count(i)
    27 wc = list(di.items())
    28 # print(wc)
    29 wc.sort(key = lambda x:x[1],reverse=True)
    30 # print(wc)
    31 print('{0:-^50}'.format('词频统计结果前20'))
    32 for i in range(20):
    33     print('{0} = {1}'.format(wc[i][0],wc[i][1]))

    运行结果:

  • 相关阅读:
    紧接着上篇文章,实现类一个是标准的FIFO,一个是出队在头部入队不一定追加到末尾
    Queue接口的实现类竟然有一个是LinkedList,一个是优先队列(同一个接口,只改了不同的实现类,附源码)
    Spring的依赖注入的2种方式(1天时间)
    操作系统-文件目录(又发现一个数据结构)
    每天进步一点点-序列化和反序列(将对象写入硬盘文件and从硬盘文件读出对象)
    每天进步一点点-实例为导学-一个java对象序列化的例子
    每天进步一点点-Java IO操作-Java Serializable(对象序列化)的理解和总结
    Linux企业运维人员最常用150个命令汇总
    如何使用yum来下载RPM包而不进行安装
    试试Linux下的ip命令,ifconfig已经过时了
  • 原文地址:https://www.cnblogs.com/alliancehacker/p/7595116.html
Copyright © 2011-2022 走看看