zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    一、词频统计预处理

    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    Finally,
    I can see you crystal clear.
    Go ahead and sell me out and I'll lay your ship bare.
    See how I leave, with every piece of you
    Don't underestimate the things that I will do.
    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    The scars of your love, remind me of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    We could have had it all..
    (you're gonna wish you, never had met me)...
    Rolling in the Deep (Tears are gonna fall, rolling in the deep)
    Your had my heart... (you're gonna wish you)...
    Inside of your hand (Never had met me)
    And you played it... (Tears are gonna fall)...
    To the beat (Rolling in the deep)
    Baby I have no story to be told,
    But I've heard one of you and I'm gonna make your head burn.
    Think of me in the depths of your despair.
    Making a home down there,
    as mine sure won't be shared.
    The scars of your love,
    remind you of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    “We could have had it all...
    (you're gonna wish you never had met me)...
    Rolling in the Deep
    (Tears are gonna fall,
    rolling in the deep)
    Your had my heart...
    (you're gonna wish you)...
    inside of your hand
    (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    To the beat
    (Rolling in the deep)
    Could have had it all
    Rolling in the deep.
    You had my heart inside of your hand,
    But you played it with your beating”
    Throw yourself through ever open door (Whoa)
    Count your blessings to find what look for (Whoa-uh)
    Turn my sorrow into treasured gold (Whoa)
    And pay me back in kind- You reap just what you sow.
    “(You're gonna wish you... Never had met me)
    We could have had it all
    (Tears are gonna fall... Rolling in the deep)
    We could have had it all yeah
    ( you're gonna wish you... never had met me)
    It all.
    (Tears are gonna fall)
    It all
    It all
    (Rolling in the deep)
    We could have had it all
    (you're gonna wish you, never had met me)
    Rolling in the deep
    (Tears are gonna fall rolling in the deep)
    You had my heart inside...
    (you're gonna wish you)... of your hand (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    to the beat (Rolling in the deep)
    We could have had it all
    ( you're wish you never had met me)
    Rolling in the deep
    (tears are gonna fall, rolling in the deep)
    You had my heart...
    ( you're gonna wish you)...
    Inside of your hand (Never had met me)”
    But you played it
    You played it.
    You played it.
    You played it to the beat.

    生成词频统计,排序

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=""",.''!"?:"""
    for c in sep:
       news=news.replace(c," " )
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
          print(w,wordDict[w])
    

      排除语法型词汇,代词、冠词、连词

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
    

      输出词频最大TOP20

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    
    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
        print(dic[i])
    

      将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容

    f=open(‘rolling.txt‘,‘r‘)
    text=f.read()
    f.close()
    print(text)

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=',.?!’:'
    for c in sep:
       news=news.replace(c," ")
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)
    
  • 相关阅读:
    HTML5 localStorage and sessionStorage
    WebViewJavascriptBridge-Obj-C和JavaScript互通消息的桥梁
    js控制手机号码中间用星号代替
    无法删除登录名 '***',因为该用户当前正处于登录状态。 (Microsoft SQL Server,错误: 15434)
    在QT中添加LIB的方法
    Android颜色代码
    判断ubuntu是32位还是64位
    Android应用市场App发布
    ubuntu目录结构(转)
    QT调用CHM方法
  • 原文地址:https://www.cnblogs.com/tyx123/p/8657914.html
Copyright © 2011-2022 走看看