zoukankan      html  css  js  c++  java
  • 英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
    song ='''
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    I can see the first leaf falling
    it's all yellow and nice
    It's so very cold outside
    like the way I'm feeling inside
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    Outside it's now raining
    and tears are falling from my eyes
    why did it have to happen
    why did it all have to end
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I have your arms around me ooooh like fire
    but when I open my eyes
    you're gone...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do feel I will miss you much
    miss you much...'''
    
    
    c=song.replace("..."," ")
    print(c)
    
    
    
    d=c.lower();
    print(d)
    
    f=d.split()
    print(f)
    
    dic={}
    for i in  f:
        count = f.count(i)
        dic[i] = count
    print(dic)
    
    prep={'a','too','but','in','not','if','will','the','and','so','are'}
    
    for i in prep:
        del(dic[i])
    
    
    
    dic1= sorted(dic.items(),key=lambda d:d[1],reverse= True)
    print(dic1)
    
    for i in range(10):
        print(dic1[i])

    最终显示结果

    心得:上网学了新的知识点,如sorted,也强化了字典的认识

    老师给的顺序并不一定是操作的顺序,可以先排除语法型词汇再去排序,这样效果更佳

  • 相关阅读:
    Object.prototype的原型对象 格式化日期【js笔记】
    数组中去掉重复的 【js笔记】
    按照内容多少,每行自动按照内容较多的div设置其他div的高度【jq笔记】
    动态操作表格 【js笔记】
    关于记录任意选择行删除或者其他的操作【jq笔记】
    Good studying and day day up
    第三周星期一
    第二周星期天
    第二周星期六
    第二周星期五
  • 原文地址:https://www.cnblogs.com/wxf2/p/8618588.html
Copyright © 2011-2022 走看看