zoukankan      html  css  js  c++  java
  • 英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
    song ='''
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    I can see the first leaf falling
    it's all yellow and nice
    It's so very cold outside
    like the way I'm feeling inside
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    Outside it's now raining
    and tears are falling from my eyes
    why did it have to happen
    why did it all have to end
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I have your arms around me ooooh like fire
    but when I open my eyes
    you're gone...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do feel I will miss you much
    miss you much...'''
    
    
    c=song.replace("..."," ")
    print(c)
    
    
    
    d=c.lower();
    print(d)
    
    f=d.split()
    print(f)
    
    dic={}
    for i in  f:
        count = f.count(i)
        dic[i] = count
    print(dic)
    
    prep={'a','too','but','in','not','if','will','the','and','so','are'}
    
    for i in prep:
        del(dic[i])
    
    
    
    dic1= sorted(dic.items(),key=lambda d:d[1],reverse= True)
    print(dic1)
    
    for i in range(10):
        print(dic1[i])

    最终显示结果

    心得:上网学了新的知识点,如sorted,也强化了字典的认识

    老师给的顺序并不一定是操作的顺序,可以先排除语法型词汇再去排序,这样效果更佳

  • 相关阅读:
    贪婪大陆
    色板游戏
    11/29 NOIP 模拟赛
    USACO4.4 重叠的图像 Frame Up
    CSP2020 题解
    NOIP前板子复习
    关于我
    【洛谷】【搜索+字符串】
    【洛谷】【动态规划/01背包】P2925 [USACO08DEC]干草出售Hay For Sale
    【洛谷】【二分查找】P1102 A−B数对
  • 原文地址:https://www.cnblogs.com/wxf2/p/8618588.html
Copyright © 2011-2022 走看看