zoukankan      html  css  js  c++  java
  • 英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
    song ='''
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    I can see the first leaf falling
    it's all yellow and nice
    It's so very cold outside
    like the way I'm feeling inside
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    Outside it's now raining
    and tears are falling from my eyes
    why did it have to happen
    why did it all have to end
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I have your arms around me ooooh like fire
    but when I open my eyes
    you're gone...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do feel I will miss you much
    miss you much...'''
    
    
    c=song.replace("..."," ")
    print(c)
    
    
    
    d=c.lower();
    print(d)
    
    f=d.split()
    print(f)
    
    dic={}
    for i in  f:
        count = f.count(i)
        dic[i] = count
    print(dic)
    
    prep={'a','too','but','in','not','if','will','the','and','so','are'}
    
    for i in prep:
        del(dic[i])
    
    
    
    dic1= sorted(dic.items(),key=lambda d:d[1],reverse= True)
    print(dic1)
    
    for i in range(10):
        print(dic1[i])

    最终显示结果

    心得:上网学了新的知识点,如sorted,也强化了字典的认识

    老师给的顺序并不一定是操作的顺序,可以先排除语法型词汇再去排序,这样效果更佳

  • 相关阅读:
    写给所有的IT民工们
    如何不重启系统加载.SYS文件
    六十八个经典故事
    利用C#重启远程计算机
    无为无不为
    男人心里到底藏着哪些秘密?
    Microsoft好员工的十个标准
    javascript版的日期输入控件
    书写NDIS过滤钩子驱动实现ip包过滤
    男人25岁前的忠告#必阅
  • 原文地址:https://www.cnblogs.com/wxf2/p/8618588.html
Copyright © 2011-2022 走看看