zoukankan      html  css  js  c++  java
  • 英文词频统计

    综合练习:英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
      word = '''
      Lately, I've been, I've been losing sleep
      Dreaming about the things that we could be
      But baby, I've been, I've been praying hard,
      Said, no more counting dollars
      We'll be counting stars, yeah we'll be counting stars
      I see this life like a swinging vine
      Swing my heart across the line
      And my face is flashing signs
      Seek it out and you shall find
      Old, but I'm not that old
      Young, but I'm not that bold
      I don't think the world is sold
      I'm just doing what we're told
      I feel something so right
      Doing the wrong thing
      I feel something so wrong
      Doing the right thing
      I could lie, coudn't I, could lie
      Everything that kills me makes me feel alive
      Lately, I've been, I've been losing sleep
      Dreaming about the things that we could be
      But baby, I've been, I've been praying hard,
      Said, no more counting dollars
      We'll be counting stars
      '''
      #标点替换为空格
      symbol = [",", ".", "!", "?", "'", ":", "-"]
      #无意义的单词
      
      words = ['t','ve','ll','m']
      
      new_art = word
      for i in range(len(symbol)):
          new_art = new_art.replace(symbol[i],' ') #把文章的标点符号替换
      
      new_art = new_art.lower() #改成小写
      art_list = new_art.split() #以空格将字符串分成单词列表
      
      dic = dict(zip())
      for i in art_list:
          dic[i] = new_art.count(i) #用字典记录单词和其出现次数
      for i in words:
          if(dic.get(i)!=None): #如果为冠词之类的无意义的词,将其舍弃
              dic.pop(i)
      
      new_dic = sorted(dic.items(),key=lambda x:x[1],reverse = True)
      
      for i in range(10):
          print(new_dic[i]) #取出现频率最高的10个单词

  • 相关阅读:
    【2020-01-28】陪伴即陪伴,擦汗即擦汗
    【2020-01-27】曼巴走了,但他还在
    【2020-01-26】今年,远亲不如近邻了
    【2020-01-25】新的一年,新的传统
    【2020-01-24】上天为这小女孩开了一扇小小窗
    【2020-01-23】故作假装的毛病
    day 31 html(二) 和css入门
    前端 day 30 html 基础一
    day 17python 面对对象之继承
    多并发编程基础 之协成
  • 原文地址:https://www.cnblogs.com/lawliet12/p/8646265.html
Copyright © 2011-2022 走看看