zoukankan      html  css  js  c++  java
  • 综合练习:英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10

    代码:

    # -*- coding:utf-8 -*-
    
    f = open('song.txt', 'r')
    song = f.read()
    f.close()
    
    symbol = ''',.?!’:"“”-%$'''
    
    exclude = '''
    a an the in on to at and of is was are were i he she you your they us their our it or for be too do no 
    that s so as but it's
    '''
    
    for i in symbol:
        song = song.replace(i, ' ')
    
    songList = song.lower().split()
    
    prep = exclude.split()
    excludeSet = set(prep)
    
    songDict = {}
    songSet = set(songList) - excludeSet
    
    for i in songSet:
        songDict[i] = songList.count(i)
    dictList = list(songDict.items())
    dictList.sort(key=lambda item: item[1], reverse=True)
    for i in range(10):
        print(dictList[i])

    输出结果:

    ('regulatory', 7)
    ('commission', 6)
    ('insurance', 5)
    ('financial', 5)
    ('bank', 5)
    ('banking', 5)
    ('china', 5)
    ('newly', 4)
    ('said', 4)
    ('central', 4)

  • 相关阅读:
    js 变量的声明能提升 初始化不会提升
    老公教我写分页
    响应式布局
    闭包优缺点
    正则表达式验证邮箱格式
    DDL表和库管理语言
    DML数据库操作语言
    python实现求第K小
    硬币凑数
    MySQL学习的表单定义
  • 原文地址:https://www.cnblogs.com/171-LAN/p/8619429.html
Copyright © 2011-2022 走看看