zoukankan      html  css  js  c++  java
  • 文件方式实现完整的英文词频统计实例

    1读入待分析的字符串

    2.分解提取单词 

    3.计数字典

    4.排除语法型词汇

    5.排序

    6.输出TOP(20)

    lyric=open('lyric.txt','w')
    lyric.write('''your butt is mine
     
    I Gonna tell you right
     
    Just show your face
     
    In broad daylight
     
    I'm telling you
     
    On how I feel
     
    Gonna Hurt Your Mind
     
    Don't shoot to kill
     
    Shamone
     
    Shamone
     
    Lay it on me
     
    All right
     
    I'm giving you
     
    On count of three
     
    To show your stuff
     
    Or let it be
     
    I'm telling you
     
    Just watch your mouth
    I know your game
     
    What you're about
     
    Well they say the sky's the limit
     
    And to me that's really true
     
    But my friend you have seen nothin'
     
    Just wait till I get through
     
    Because I'm bad,I'm bad
     
    shamone
     
    (Bad,bad,really,really bad)
     
    You know I'm bad,I'm bad
     
    (Bad,bad,really,really bad)
     
    You know it
     
    You know I'm bad,I'm bad
     
    Come on,you know
     
    (Bad,bad,really,really bad)
     
    And the whole world
     
    Has to answer right now
     
    Just to tell you once again
     
    Who's bad
     
    The word is out
     
    You're doin' wrong
     
    Gonna lock you up
     
    Before too long
     
    Your lyin' eyes
     
    Gonna tell you right
     
    So listen up
     
    Don't make a fight
     
    Your talk is cheap
     
    You're not a man
     
    Your throwin' stones
     
    To hide your hands
     
    Well they say the sky's the limit
     
    And to me that's really true
     
    But my friend you have seen nothin'
     
    Just wait till I get through
     
    Because I'm bad,I'm bad
     
    shamone
     
    (Bad,bad,really,really bad)
     
    You know I'm bad,I'm bad
     
    (Bad,bad,really,really bad)
     
    You know it
     
    You know I'm bad,I'm bad
     
    Come on,you know
     
    (Bad,bad,really,really bad)
     
    And the whole world
     
    Has to answer right now
     
    Just to tell you once again
     
    Who's bad
     
    We could change the world tomorrow
     
    This could be a better place
     
    If you don't like what I'm sayin'
     
    Then won't you slap my face
     
    Because I'm bad''')
    lyric.close()
    comment=open('lyric.txt','r')
    bad=comment.read()
    comment.close()
    
    bad=bad.lower()
    for i in ",.?!()":
        bad=bad.replace(i,' ')
    bad=bad.replace('
    ',' ')
    words=bad.split(' ')
    s=set(words)
    
    delete={"the","a","it","to","on","and"}
    for i in delete:
        s.remove(i)
        
    dic={}
    lis=[]
    for i in s:
        if(i==" "):
            continue
        if(i==""):
            continue 
        dic[i]=words.count(i)
        lis.append(words.count(i))
    
    lis=list (dic.items())
    lis.sort(key=lambda x:x[1],reverse=True)
    for i in range(20):
        print(lis[i])
    

    运行:

  • 相关阅读:
    运用《深入理解Java虚拟机》书中知识解决实际问题
    FPGA实现移动目标检测
    FPGA实现人脸检测
    FPGA实现图像的边缘检测:灰度形态学梯度
    FPGA实现图像的二值形态学滤波:边界提取
    VAST3.0规范
    Flash Socket通信的安全策略问题 843端口
    100个开源游戏
    游戏指标分析
    网络广告类型有哪些?
  • 原文地址:https://www.cnblogs.com/mavenlon/p/7595133.html
Copyright © 2011-2022 走看看