zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    一、词频统计预处理

    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    Finally,
    I can see you crystal clear.
    Go ahead and sell me out and I'll lay your ship bare.
    See how I leave, with every piece of you
    Don't underestimate the things that I will do.
    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    The scars of your love, remind me of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    We could have had it all..
    (you're gonna wish you, never had met me)...
    Rolling in the Deep (Tears are gonna fall, rolling in the deep)
    Your had my heart... (you're gonna wish you)...
    Inside of your hand (Never had met me)
    And you played it... (Tears are gonna fall)...
    To the beat (Rolling in the deep)
    Baby I have no story to be told,
    But I've heard one of you and I'm gonna make your head burn.
    Think of me in the depths of your despair.
    Making a home down there,
    as mine sure won't be shared.
    The scars of your love,
    remind you of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    “We could have had it all...
    (you're gonna wish you never had met me)...
    Rolling in the Deep
    (Tears are gonna fall,
    rolling in the deep)
    Your had my heart...
    (you're gonna wish you)...
    inside of your hand
    (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    To the beat
    (Rolling in the deep)
    Could have had it all
    Rolling in the deep.
    You had my heart inside of your hand,
    But you played it with your beating”
    Throw yourself through ever open door (Whoa)
    Count your blessings to find what look for (Whoa-uh)
    Turn my sorrow into treasured gold (Whoa)
    And pay me back in kind- You reap just what you sow.
    “(You're gonna wish you... Never had met me)
    We could have had it all
    (Tears are gonna fall... Rolling in the deep)
    We could have had it all yeah
    ( you're gonna wish you... never had met me)
    It all.
    (Tears are gonna fall)
    It all
    It all
    (Rolling in the deep)
    We could have had it all
    (you're gonna wish you, never had met me)
    Rolling in the deep
    (Tears are gonna fall rolling in the deep)
    You had my heart inside...
    (you're gonna wish you)... of your hand (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    to the beat (Rolling in the deep)
    We could have had it all
    ( you're wish you never had met me)
    Rolling in the deep
    (tears are gonna fall, rolling in the deep)
    You had my heart...
    ( you're gonna wish you)...
    Inside of your hand (Never had met me)”
    But you played it
    You played it.
    You played it.
    You played it to the beat.

    生成词频统计,排序

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=""",.''!"?:"""
    for c in sep:
       news=news.replace(c," " )
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
          print(w,wordDict[w])
    

      排除语法型词汇,代词、冠词、连词

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
    

      输出词频最大TOP20

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    
    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
        print(dic[i])
    

      将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容

    f=open(‘rolling.txt‘,‘r‘)
    text=f.read()
    f.close()
    print(text)

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=',.?!’:'
    for c in sep:
       news=news.replace(c," ")
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)
    
  • 相关阅读:
    return和exit以及C语言递归函数
    一个C语言外挂程序
    thinkphp查询构造器和链式操作、事务
    thinkphp一般数据库操作
    thinkphp上传图片
    thinkphp类型转换
    解决索引中碎片的问题
    SQL SERVER中非聚集索引的覆盖,连接,交叉,过滤
    兼容IE的CSS的”引入方式“
    CSS之display:block与display:inline-block
  • 原文地址:https://www.cnblogs.com/tyx123/p/8657914.html
Copyright © 2011-2022 走看看