zoukankan      html  css  js  c++  java
  • 英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
    song ='''
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    I can see the first leaf falling
    it's all yellow and nice
    It's so very cold outside
    like the way I'm feeling inside
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel 
    that I too too will miss you much
    miss you much...
    Outside it's now raining
    and tears are falling from my eyes
    why did it have to happen
    why did it all have to end
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I have your arms around me ooooh like fire
    but when I open my eyes
    you're gone...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do do feel
    that I too too will miss you much
    miss you much...
    I'm a big big girl
    in a big big world
    It's not a big big thing if you leave me
    but I do feel I will miss you much
    miss you much...'''
    
    
    c=song.replace("..."," ")
    print(c)
    
    
    
    d=c.lower();
    print(d)
    
    f=d.split()
    print(f)
    
    dic={}
    for i in  f:
        count = f.count(i)
        dic[i] = count
    print(dic)
    
    prep={'a','too','but','in','not','if','will','the','and','so','are'}
    
    for i in prep:
        del(dic[i])
    
    
    
    dic1= sorted(dic.items(),key=lambda d:d[1],reverse= True)
    print(dic1)
    
    for i in range(10):
        print(dic1[i])

    最终显示结果

    心得:上网学了新的知识点,如sorted,也强化了字典的认识

    老师给的顺序并不一定是操作的顺序,可以先排除语法型词汇再去排序,这样效果更佳

  • 相关阅读:
    Graph neural networks: A review of methods and applications文献阅读
    IMBD数据集处理
    GNN知识整理(二)
    GNN认识整理(一)
    Linux中python中的#!/usr/bin/python
    Linux下运行g++
    itextpdf7自写算法的表格展示 制表符
    itext7 List序号 有序列表 解决中文不显示
    java使用itextpdf7实现导出pdf表格;java使用itextpdf7实现pdf加水印
    csv导出导入工具类 commons-csv导出
  • 原文地址:https://www.cnblogs.com/wxf2/p/8618588.html
Copyright © 2011-2022 走看看