zoukankan      html  css  js  c++  java
  • 综合练习:英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10

    代码:

    # -*- coding:utf-8 -*-
    
    f = open('song.txt', 'r')
    song = f.read()
    f.close()
    
    symbol = ''',.?!’:"“”-%$'''
    
    exclude = '''
    a an the in on to at and of is was are were i he she you your they us their our it or for be too do no 
    that s so as but it's
    '''
    
    for i in symbol:
        song = song.replace(i, ' ')
    
    songList = song.lower().split()
    
    prep = exclude.split()
    excludeSet = set(prep)
    
    songDict = {}
    songSet = set(songList) - excludeSet
    
    for i in songSet:
        songDict[i] = songList.count(i)
    dictList = list(songDict.items())
    dictList.sort(key=lambda item: item[1], reverse=True)
    for i in range(10):
        print(dictList[i])

    输出结果:

    ('regulatory', 7)
    ('commission', 6)
    ('insurance', 5)
    ('financial', 5)
    ('bank', 5)
    ('banking', 5)
    ('china', 5)
    ('newly', 4)
    ('said', 4)
    ('central', 4)

  • 相关阅读:
    Java 中常用缓存Cache机制的实现
    Spring普通类获取bean
    系统升级shell
    shell 操作文本内容(修改增加)
    接口实践;接口与抽象类
    Java中@Override的作用
    类实现多个接口的例子
    java 反射实践
    对象主要属性及识别
    java抽象类实践
  • 原文地址:https://www.cnblogs.com/171-LAN/p/8619429.html
Copyright © 2011-2022 走看看