zoukankan      html  css  js  c++  java
  • 文件方式实现完整的英文词频统计实例

    1. 读入待分析的字符串
    2. 分解提取单词 
    3. 计数字典
    4. 排除语法型词汇
    5. 排序
    6. 输出TOP(20)
    fo=open('test.txt','w')
    >>> fo.write('''Twinkle Twinkle Little Star
      (Declan's Prayer) - Declan Galbraith
    
      Twinkle twinkle little star,
      How I wonder what you are,
      Up above the world so high,
      Like a diamond in the sky,
      Star light,
      Star bright,
      The first star I see tonight,
      I wish I may, I wish I might,
      Have the wish I wish tonight,
    
      Twinkle twinkle little star,
      How I wonder what you are,
      I have so many wishes to make,
      But most of all is what I state,
      So just wonder,
      That I've been dreaming of,
      I wish that I can have owe her enough,
      I wish I may, I wish I might,
      Have the dream I dream tonight,
    
      Ooo baby
    
      Twinkle twinkle little star,
      How I wonder what you are,
      I want a girl who'll be all mine,
      And wants to say that I'm her guy,
      Someone's sweet that's for sure,
      I want to be the one shes looking for,
      I wish I may, I wish I might,
      Have the girl I wish tonight,
    
      Ooo baby
    
      Twinkle twinkle little star,
      How I wonder what you are,
      Up above the world so high,
      Like a diamond in the sky,
      Star light,
      Star bright,
      The first star I see tonight,
      I wish I may, I wish I might,
      Have the wish I wish tonight.''')
    1138
    >>> fo.close()
    >>> fr=open('test.txt','r')
    >>> fr.read()
    fo=open('test.txt','r')
    song=fo.read()
    exc={'the','in','to','a','of','and','on','what','that'}
    song=song.lower()
    for i in '''.,-
    	u3000'()"''':
        song=song.replace(i,'')
    words=song.split(' ')
    dic={}
    keys=set(words)
    keys=keys-exc
    for w in keys:
        dic[w]=words.count(w)
    
    wc = list(dic.items())
    wc.sort(key=lambda x:x[1],reverse=True)
    print(wc)
    for w in range(20):
        print(wc[w])

     

  • 相关阅读:
    WPF 中英文切换
    System.Data.Entity.Core.ProviderIncompatibleException:0x89c50107
    WinForm使用原生gdi+绘制自定义曲线图、折线图
    C#使用EPPlus读写excel
    ICSharpCode.SharpZipLib C# 压缩文件夹SharpZipHelper
    C# XSLT 转换word 生成word
    代码生成器集合
    优秀的个人博客
    面试经典复习资料
    图解算法
  • 原文地址:https://www.cnblogs.com/lintingting/p/7595150.html
Copyright © 2011-2022 走看看