zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    综合练习

    词频统计预处理

    下载一首英文的歌词或文章

    将所有,.?!’:等分隔符全部替换为空格

    str = '''Passion is sweet
    Love makes weak
    You said you cherised freedom so
    You refused to let it go
    Follow your faith 
    Love and hate
    never failed to seize the day
    Don't give yourself away
    Oh when the night falls
    And your all alone
    In your deepest sleep 
    What are you dreeeming of
    My skin's still burning from your touch
    Oh I just can't get enough 
    I said I wouldn't ask for much
    But your eyes are dangerous
    So the tought keeps spinning in my head
    Can we drop this masquerade
    I can't predict where it ends
    If you're the rock I'll crush against
    Trapped in a crowd
    Music's loud
    I said I loved my freedom too
    Now im not so sure i do
    All eyes on you
    Wings so true
    Better quit while your ahead
    Now im not so sure i am
    Oh when the night falls
    And your all alone
    In your deepest sleep
    What are you dreaming of
    My skin's still burning from your touch
    Oh I just can't get enough
    I said I wouldn't ask for much
    But your eyes are dangerous
    So the thought keeps spinning in my head
    Can we drop this masquerade 
    I can't predict where it ends
    If you're the rock I'll crush against
    My soul, my heart
    If your near or if your far
    My life, my love
    You can have it all
    Oh when the night falls
    And your all alone
    In your deepest sleep
    What are you dreaming of
    My skin's still burning from your touch
    Oh I just can't get enough
    I said I wouldn't ask for much
    But your eyes are dangerous 
    So the thought keeps spinning in my head
    Can we drop this masquerade
    I can't predict where it ends
    If you're the rock I'll crush against
    If you're the rock i'll crush against'''
    sep = '''.?,'"!'''
    for i in sep:
        str = str.replace(i, ' ')
    print(str)

    结果如下:

    Love and hate
    never failed to seize the day
    Don t give yourself away
    Oh when the night falls
    And your all alone
    In your deepest sleep
    What are you dreeeming of
    My skin s still burning from your touch
    Oh I just can t get enough
    I said I wouldn t ask for much
    But your eyes are dangerous
    So the tought keeps spinning in my head
    Can we drop this masquerade
    I can t predict where it ends
    If you re the rock I ll crush against
    Trapped in a crowd
    Music s loud
    I said I loved my freedom too
    Now im not so sure i do
    All eyes on you
    Wings so true
    Better quit while your ahead
    Now im not so sure i am
    Oh when the night falls
    And your all alone
    In your deepest sleep
    What are you dreaming of
    My skin s still burning from your touch
    Oh I just can t get enough
    I said I wouldn t ask for much
    But your eyes are dangerous
    So the thought keeps spinning in my head
    Can we drop this masquerade
    I can t predict where it ends
    If you re the rock I ll crush against
    My soul my heart
    If your near or if your far
    My life my love
    You can have it all
    Oh when the night falls
    And your all alone
    In your deepest sleep
    What are you dreaming of
    My skin s still burning from your touch
    Oh I just can t get enough
    I said I wouldn t ask for much
    But your eyes are dangerous
    So the thought keeps spinning in my head
    Can we drop this masquerade
    I can t predict where it ends
    If you re the rock I ll crush against
    If you re the rock i ll crush against

    将所有大写转换为小写

    print(str.lower())

    结果如下:

    passion is sweet
    love makes weak
    you said you cherised freedom so
    you refused to let it go
    follow your faith
    love and hate
    never failed to seize the day
    don t give yourself away
    oh when the night falls
    and your all alone
    in your deepest sleep
    what are you dreeeming of
    my skin s still burning from your touch
    oh i just can t get enough
    i said i wouldn t ask for much
    but your eyes are dangerous
    so the tought keeps spinning in my head
    can we drop this masquerade
    i can t predict where it ends
    if you re the rock i ll crush against
    trapped in a crowd
    music s loud
    i said i loved my freedom too
    now im not so sure i do
    all eyes on you
    wings so true
    better quit while your ahead
    now im not so sure i am
    oh when the night falls
    and your all alone
    in your deepest sleep
    what are you dreaming of
    my skin s still burning from your touch
    oh i just can t get enough
    i said i wouldn t ask for much
    but your eyes are dangerous
    so the thought keeps spinning in my head
    can we drop this masquerade
    i can t predict where it ends
    if you re the rock i ll crush against
    my soul my heart
    if your near or if your far
    my life my love
    you can have it all
    oh when the night falls
    and your all alone
    in your deepest sleep
    what are you dreaming of
    my skin s still burning from your touch
    oh i just can t get enough
    i said i wouldn t ask for much
    but your eyes are dangerous
    so the thought keeps spinning in my head
    can we drop this masquerade
    i can t predict where it ends
    if you re the rock i ll crush against
    if you re the rock i ll crush against

    生成单词列表

    wordList = str.lower().split()
    print(wordList)

    结果如下:

    ['passion', 'is', 'sweet', 'love', 'makes', 'weak', 'you', 'said', 'you', 'cherised', 'freedom', 'so', 'you', 'refused', 'to', 'let', 'it', 'go', 'follow', 'your', 'faith', 'love', 'and', 'hate', 'never', 'failed', 'to', 'seize', 'the', 'day', 'don', 't', 'give', 'yourself', 'away', 'oh', 'when', 'the', 'night', 'falls', 'and', 'your', 'all', 'alone', 'in', 'your', 'deepest', 'sleep', 'what', 'are', 'you', 'dreeeming', 'of', 'my', 'skin', 's', 'still', 'burning', 'from', 'your', 'touch', 'oh', 'i', 'just', 'can', 't', 'get', 'enough', 'i', 'said', 'i', 'wouldn', 't', 'ask', 'for', 'much', 'but', 'your', 'eyes', 'are', 'dangerous', 'so', 'the', 'tought', 'keeps', 'spinning', 'in', 'my', 'head', 'can', 'we', 'drop', 'this', 'masquerade', 'i', 'can', 't', 'predict', 'where', 'it', 'ends', 'if', 'you', 're', 'the', 'rock', 'i', 'll', 'crush', 'against', 'trapped', 'in', 'a', 'crowd', 'music', 's', 'loud', 'i', 'said', 'i', 'loved', 'my', 'freedom', 'too', 'now', 'im', 'not', 'so', 'sure', 'i', 'do', 'all', 'eyes', 'on', 'you', 'wings', 'so', 'true', 'better', 'quit', 'while', 'your', 'ahead', 'now', 'im', 'not', 'so', 'sure', 'i', 'am', 'oh', 'when', 'the', 'night', 'falls', 'and', 'your', 'all', 'alone', 'in', 'your', 'deepest', 'sleep', 'what', 'are', 'you', 'dreaming', 'of', 'my', 'skin', 's', 'still', 'burning', 'from', 'your', 'touch', 'oh', 'i', 'just', 'can', 't', 'get', 'enough', 'i', 'said', 'i', 'wouldn', 't', 'ask', 'for', 'much', 'but', 'your', 'eyes', 'are', 'dangerous', 'so', 'the', 'thought', 'keeps', 'spinning', 'in', 'my', 'head', 'can', 'we', 'drop', 'this', 'masquerade', 'i', 'can', 't', 'predict', 'where', 'it', 'ends', 'if', 'you', 're', 'the', 'rock', 'i', 'll', 'crush', 'against', 'my', 'soul', 'my', 'heart', 'if', 'your', 'near', 'or', 'if', 'your', 'far', 'my', 'life', 'my', 'love', 'you', 'can', 'have', 'it', 'all', 'oh', 'when', 'the', 'night', 'falls', 'and', 'your', 'all', 'alone', 'in', 'your', 'deepest', 'sleep', 'what', 'are', 'you', 'dreaming', 'of', 'my', 'skin', 's', 'still', 'burning', 'from', 'your', 'touch', 'oh', 'i', 'just', 'can', 't', 'get', 'enough', 'i', 'said', 'i', 'wouldn', 't', 'ask', 'for', 'much', 'but', 'your', 'eyes', 'are', 'dangerous', 'so', 'the', 'thought', 'keeps', 'spinning', 'in', 'my', 'head', 'can', 'we', 'drop', 'this', 'masquerade', 'i', 'can', 't', 'predict', 'where', 'it', 'ends', 'if', 'you', 're', 'the', 'rock', 'i', 'll', 'crush', 'against', 'if', 'you', 're', 'the', 'rock', 'i', 'll', 'crush', 'against']

    生成词频统计

    wordDict = {}
    wordSet = set(wordList)
    for i in wordSet:
        wordDict[i] = wordList.count(i);
    print(wordDict)

    结果如下:

    {'thought': 2, 'loved': 1, 'wings': 1, 'or': 1, 'said': 5, 'if': 6, 'quit': 1, 'spinning': 3, 'night': 3, 'll': 4, 'refused': 1, 'get': 3, 'am': 1, 'cherised': 1, 'your': 16, 'oh': 6, 'we': 3, 'let': 1, 'are': 6, 'give': 1, 'ahead': 1, 'falls': 3, 'when': 3, 'burning': 3, 'but': 3, 'trapped': 1, 'while': 1, 'ask': 3, 'alone': 3, 'and': 4, 'seize': 1, 'is': 1, 'against': 4, 'keeps': 3, 'makes': 1, 'loud': 1, 't': 10, 'of': 3, 'head': 3, 'dreaming': 2, 'dangerous': 3, 'enough': 3, 'on': 1, 'for': 3, 'a': 1, 'so': 7, 'heart': 1, 'much': 3, 'ends': 3, 'where': 3, 'now': 2, 'weak': 1, 'rock': 4, 'life': 1, 'just': 3, 's': 4, 'crowd': 1, 'music': 1, 'true': 1, 'far': 1, 'in': 7, 'you': 12, 'away': 1, 'do': 1, 'to': 2, 'failed': 1, 'this': 3, 'better': 1, 'it': 5, 'sweet': 1, 'im': 2, 'from': 3, 'all': 5, 'eyes': 4, 'can': 10, 'dreeeming': 1, 'the': 11, 'sleep': 3, 'go': 1, 'faith': 1, 'touch': 3, 'hate': 1, 'predict': 3, 'i': 20, 'day': 1, 'tought': 1, 're': 4, 'still': 3, 'what': 3, 'masquerade': 3, 'drop': 3, 'deepest': 3, 'freedom': 2, 'passion': 1, 'too': 1, 'don': 1, 'yourself': 1, 'not': 2, 'have': 1, 'never': 1, 'crush': 4, 'near': 1, 'love': 3, 'wouldn': 3, 'sure': 2, 'my': 11, 'follow': 1, 'skin': 3, 'soul': 1}

    排序

    dictList = list(wordDict.items())
    dictList.sort(key=lambda x: x[1], reverse=True)
    for i in dictList:
        print(i)

    结果如下:

    ('i', 20)
    ('your', 16)
    ('you', 12)
    ('my', 11)
    ('the', 11)
    ('t', 10)
    ('can', 10)
    ('so', 7)
    ('in', 7)
    ('if', 6)
    ('oh', 6)
    ('are', 6)
    ('all', 5)
    ('said', 5)
    ('it', 5)
    ('crush', 4)
    ('against', 4)
    ('s', 4)
    ('re', 4)
    ('and', 4)
    ('ll', 4)
    ('eyes', 4)
    ('rock', 4)
    ('for', 3)
    ('alone', 3)
    ('ask', 3)
    ('night', 3)
    ('but', 3)
    ('spinning', 3)
    ('this', 3)
    ('predict', 3)
    ('from', 3)
    ('wouldn', 3)
    ('we', 3)
    ('touch', 3)
    ('when', 3)
    ('enough', 3)
    ('skin', 3)
    ('falls', 3)
    ('deepest', 3)
    ('what', 3)
    ('much', 3)
    ('sleep', 3)
    ('masquerade', 3)
    ('head', 3)
    ('just', 3)
    ('ends', 3)
    ('still', 3)
    ('where', 3)
    ('of', 3)
    ('drop', 3)
    ('get', 3)
    ('love', 3)
    ('keeps', 3)
    ('burning', 3)
    ('dangerous', 3)
    ('to', 2)
    ('dreaming', 2)
    ('not', 2)
    ('sure', 2)
    ('thought', 2)
    ('im', 2)
    ('now', 2)
    ('freedom', 2)
    ('let', 1)
    ('cherised', 1)
    ('have', 1)
    ('dreeeming', 1)
    ('give', 1)
    ('trapped', 1)
    ('music', 1)
    ('far', 1)
    ('follow', 1)
    ('day', 1)
    ('is', 1)
    ('crowd', 1)
    ('loud', 1)
    ('failed', 1)
    ('better', 1)
    ('passion', 1)
    ('sweet', 1)
    ('soul', 1)
    ('or', 1)
    ('never', 1)
    ('seize', 1)
    ('near', 1)
    ('hate', 1)
    ('a', 1)
    ('heart', 1)
    ('do', 1)
    ('yourself', 1)
    ('ahead', 1)
    ('am', 1)
    ('loved', 1)
    ('tought', 1)
    ('weak', 1)
    ('on', 1)
    ('quit', 1)
    ('while', 1)
    ('wings', 1)
    ('away', 1)
    ('go', 1)
    ('life', 1)
    ('too', 1)
    ('faith', 1)
    ('makes', 1)
    ('refused', 1)
    ('don', 1)
    ('true', 1)

    排除语法型词汇,代词、冠词、连词

    exculde = {'the', 'i', 'you', 'is', 'and', 'my', 'or'}
    for i in exculde:
        wordDict.pop(i)
    print(wordDict)

    结果如下:

    {'s': 4, 'on': 1, 'don': 1, 'too': 1, 'better': 1, 'day': 1, 'wouldn': 3, 'deepest': 3, 'refused': 1, 't': 10, 'away': 1, 'spinning': 3, 'ends': 3, 'where': 3, 'follow': 1, 'drop': 3, 'loud': 1, 'freedom': 2, 'near': 1, 'while': 1, 'do': 1, 'it': 5, 'sleep': 3, 'failed': 1, 'said': 5, 'but': 3, 'true': 1, 'far': 1, 'this': 3, 'can': 10, 'for': 3, 'burning': 3, 'from': 3, 'love': 3, 'all': 5, 'to': 2, 'loved': 1, 'music': 1, 'soul': 1, 'so': 7, 'skin': 3, 'crush': 4, 'touch': 3, 'cherised': 1, 'in': 7, 'quit': 1, 'enough': 3, 'oh': 6, 'am': 1, 'weak': 1, 'we': 3, 'heart': 1, 'eyes': 4, 'not': 2, 'yourself': 1, 'now': 2, 'seize': 1, 'when': 3, 'never': 1, 'ask': 3, 'head': 3, 'a': 1, 'get': 3, 'if': 6, 'night': 3, 'faith': 1, 'rock': 4, 'predict': 3, 'll': 4, 'hate': 1, 'masquerade': 3, 'passion': 1, 'of': 3, 'have': 1, 'go': 1, 'trapped': 1, 're': 4, 'what': 3, 'dreaming': 2, 'dreeeming': 1, 'alone': 3, 'dangerous': 3, 'sweet': 1, 'against': 4, 'tought': 1, 'are': 6, 'thought': 2, 'your': 16, 'falls': 3, 'sure': 2, 'life': 1, 'makes': 1, 'ahead': 1, 'still': 3, 'give': 1, 'wings': 1, 'let': 1, 'keeps': 3, 'much': 3, 'crowd': 1, 'just': 3, 'im': 2}

    输出词频最大TOP20

    for i in range(20):
        print(dictList[i])

    结果如下:

    ('i', 20)
    ('your', 16)
    ('you', 12)
    ('the', 11)
    ('my', 11)
    ('t', 10)
    ('can', 10)
    ('in', 7)
    ('so', 7)
    ('are', 6)
    ('oh', 6)
    ('if', 6)
    ('it', 5)
    ('said', 5)
    ('all', 5)
    ('s', 4)
    ('crush', 4)
    ('re', 4)
    ('against', 4)
    ('ll', 4)

    将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。

    fo = open('music.txt', 'r')
    file = fo.read()
    fo.close()
    print(file)
  • 相关阅读:
    LNMP源码编译安装(centos7+nginx1.9+mysql5.6+php7)
    linux 最大文件查找
    Nginx 日志分享
    ZendGuardLoader安装
    移动端播放直播流(video.js 播放 m3u8 流)
    Linux下 PostgrelSQL 基本操作
    CenterOS7 安装Mysql8 及安装会遇到的问题
    Linux下 导出postgrelSql 数据库
    《编译程序设计原理与技术》笔记之自动机与正规表达式
    Linux定时检测内存,若使用率超过指标,重启Tomcat并清空内存
  • 原文地址:https://www.cnblogs.com/a305810827/p/8649839.html
Copyright © 2011-2022 走看看