zoukankan      html  css  js  c++  java
  • 英语词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10

    代码:

    复制代码
    # -*- coding:utf-8 -*-
    
    song = '''
    Nobody ever knows
    Nobody ever sees
    I left my soul
    Back then no I'm too weak
    Most nights I pray for you to come home
    Praying to the lord
    Praying for my soul
    Now please don't go
    Most nights I hardly sleep when I'm alone
    Now please don't go oh no
    I think of you whenever I'm alone
    So please don't go

    Cause I don't ever wanna know
    Don't ever want to see things change
    Cause when I'm living on my own
    I wanna take it back and start again
    Most nights I pray for you to come home
    I'm praying to the lord
    I'm praying for my soul
    Now please don't go
    Most nights I hardly sleep
    When I'm alone
    Now please don't go oh no
    I think of you whenever I'm alone
    So please don't go
    I sent so many messages
    You don't reply
    Gotta feel around what am I missing babe
    Singing now oh oh oh
    I need you now I need your love oh
    Now please don't go
    I said most nights I hardly sleep
    When I'm alone
    Now please don't go oh no
    I think of you whenever I'm alone
    So please don't go
    So please don't go
    So please don't go
    Oh no
    I think of you whenever I'm alone
    So please don't go ''' symbol = list(''',.?!’:"“”-%$''') for i in symbol: song = song.replace(i, ' ') song = song.lower() split = song.split() word = {} for i in split: count = song.count(i) word[i] = count words = ''' a an the in on to at and of is was are were i he she you your they us their our it or for be too do no that s so as but it's '''
    prep = words.split() for i in prep: # 判断单词是否在字典中 if i in word.keys(): del(word[i]) word = sorted(word.items(), key=lambda item: item[1], reverse=True) for i in range(10): print(word[i])
  • 相关阅读:
    python解析本地HTML文件
    爬取潇湘书院首页侧边栏
    Python文件的读取写入操作
    Python错误和异常
    Python字典
    python列表
    电文加密小程序
    课后练习题随笔(一)
    Redis基础操作
    Django学习_BBS开发
  • 原文地址:https://www.cnblogs.com/verson/p/8629082.html
Copyright © 2011-2022 走看看