zoukankan      html  css  js  c++  java
  • 综合练习:英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
    song = '''
    If you say you’re the firework at the bay
    
    I wish I could be a wave
    
    after the rain, you light up the gray
    
    far away you’re the galaxy from space
    
    with the stars you kiss my face
    
    I’ll go everywhere after your trace
    
    when I’m lonely l willearntoembrace
    
    I’ll follow you along the way
    
    like shadow chasing down the flame
    
    I’ll wait for you right on your way
    
    come and stay with me if you may
    
    I’ll raise my head and look your way
    
    tears dropping down and feeling free
    
    Some love comes by like hurricane
    
    as if I play your losing game
    
    If you’re like firefly in summer haze
    
    Children laugh around your grace
    
    Then I’ll be there, trying to say out your name
    
    Look at me, what a tiny helpless me
    
    Only dream when you smile at me
    
    Maybe you wouldn’t stop just for me
    
    Far behind let me stand there singing
    
    I’ll follow you along the way
    
    like shadow chasing down the flame
    
    I’ll wait for you right on your way
    
    come and stay with me if you may
    
    I’ll raise my head and look your way
    
    tears dropping down and feeling free
    
    Some love comes by like hurricane
    
    but rainbows rise
    
    I’ll follow you along the way
    
    like shadow chasing down the flame
    
    I’ll wait for you right on your way
    
    come and stay with me if you may
    
    I’ll raise my head and look your way
    
    tears dropping down and feeling free
    
    Some love comes by like hurricane
    
    but rainbows rise after the pain
    '''

    #将所有分隔符全部替换为空格,将所有大写转换为小写,以空格划分每个单词 s1 = song.replace('',' ').lower().split() s2 = song.split() #统计各单词出现的次数 c = {} for i in s2: count = s1.count(i) c[i] = count #去掉没意义的单词 word = ''' i you you're the by up a but my and would when some i'll i'm with on could come from Maybe only out me in at for if your down ''' s3 = word.split() for i in s3: if i in c.keys(): del (c[i])
    #按每个单词出现的次数进行排序 count = sorted(c.items(),key=lambda items: items[1], reverse=True) #输出词频最大TOP10 for i in range(10): print(count[i])
  • 相关阅读:
    四则运算2之单元测试
    四则运算2之小学二年级
    四则运算2--思路
    大道至简---读书随笔
    随机30道四则运算
    读书计划
    软件工程课堂作业(七)——电梯调度之需求规格说明书
    《梦断代码Dreaming In Code》阅读笔记(二)
    软件工程课堂作业(六)——结对开发(二)
    软件工程课堂作业(五)——终极版随机产生四则运算题目(C++)
  • 原文地址:https://www.cnblogs.com/wumeiying/p/8647006.html
Copyright © 2011-2022 走看看