zoukankan      html  css  js  c++  java
  • 文件方式实现完整的英文词频统计实例

    1. 读入待分析的字符串
    2. 分解提取单词 
    3. 计数字典
    4. 排除语法型词汇
    5. 排序
    6. 输出TOP(20)
    fo=open('test.txt','w')
    >>> fo.write('''Twinkle Twinkle Little Star
      (Declan's Prayer) - Declan Galbraith
    
      Twinkle twinkle little star,
      How I wonder what you are,
      Up above the world so high,
      Like a diamond in the sky,
      Star light,
      Star bright,
      The first star I see tonight,
      I wish I may, I wish I might,
      Have the wish I wish tonight,
    
      Twinkle twinkle little star,
      How I wonder what you are,
      I have so many wishes to make,
      But most of all is what I state,
      So just wonder,
      That I've been dreaming of,
      I wish that I can have owe her enough,
      I wish I may, I wish I might,
      Have the dream I dream tonight,
    
      Ooo baby
    
      Twinkle twinkle little star,
      How I wonder what you are,
      I want a girl who'll be all mine,
      And wants to say that I'm her guy,
      Someone's sweet that's for sure,
      I want to be the one shes looking for,
      I wish I may, I wish I might,
      Have the girl I wish tonight,
    
      Ooo baby
    
      Twinkle twinkle little star,
      How I wonder what you are,
      Up above the world so high,
      Like a diamond in the sky,
      Star light,
      Star bright,
      The first star I see tonight,
      I wish I may, I wish I might,
      Have the wish I wish tonight.''')
    1138
    >>> fo.close()
    >>> fr=open('test.txt','r')
    >>> fr.read()
    fo=open('test.txt','r')
    song=fo.read()
    exc={'the','in','to','a','of','and','on','what','that'}
    song=song.lower()
    for i in '''.,-
    	u3000'()"''':
        song=song.replace(i,'')
    words=song.split(' ')
    dic={}
    keys=set(words)
    keys=keys-exc
    for w in keys:
        dic[w]=words.count(w)
    
    wc = list(dic.items())
    wc.sort(key=lambda x:x[1],reverse=True)
    print(wc)
    for w in range(20):
        print(wc[w])

     

  • 相关阅读:
    三级听力
    查找算法集(数组实现、链表实现)(转贴)
    男人一生必须要做10件事(转载)
    经典源码网(集合)
    ubuntu8.04下mplayer错误error:could not open required directshow codec drvc.dll
    asp.net 访问 iis的权限 问题
    OPENROWSET 说明
    vb多线程问题
    收缩数据库日志文件(转贴)
    Update 两个表之间数据更新
  • 原文地址:https://www.cnblogs.com/lintingting/p/7595150.html
Copyright © 2011-2022 走看看