zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    一、词频统计预处理

    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    Finally,
    I can see you crystal clear.
    Go ahead and sell me out and I'll lay your ship bare.
    See how I leave, with every piece of you
    Don't underestimate the things that I will do.
    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    The scars of your love, remind me of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    We could have had it all..
    (you're gonna wish you, never had met me)...
    Rolling in the Deep (Tears are gonna fall, rolling in the deep)
    Your had my heart... (you're gonna wish you)...
    Inside of your hand (Never had met me)
    And you played it... (Tears are gonna fall)...
    To the beat (Rolling in the deep)
    Baby I have no story to be told,
    But I've heard one of you and I'm gonna make your head burn.
    Think of me in the depths of your despair.
    Making a home down there,
    as mine sure won't be shared.
    The scars of your love,
    remind you of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    “We could have had it all...
    (you're gonna wish you never had met me)...
    Rolling in the Deep
    (Tears are gonna fall,
    rolling in the deep)
    Your had my heart...
    (you're gonna wish you)...
    inside of your hand
    (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    To the beat
    (Rolling in the deep)
    Could have had it all
    Rolling in the deep.
    You had my heart inside of your hand,
    But you played it with your beating”
    Throw yourself through ever open door (Whoa)
    Count your blessings to find what look for (Whoa-uh)
    Turn my sorrow into treasured gold (Whoa)
    And pay me back in kind- You reap just what you sow.
    “(You're gonna wish you... Never had met me)
    We could have had it all
    (Tears are gonna fall... Rolling in the deep)
    We could have had it all yeah
    ( you're gonna wish you... never had met me)
    It all.
    (Tears are gonna fall)
    It all
    It all
    (Rolling in the deep)
    We could have had it all
    (you're gonna wish you, never had met me)
    Rolling in the deep
    (Tears are gonna fall rolling in the deep)
    You had my heart inside...
    (you're gonna wish you)... of your hand (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    to the beat (Rolling in the deep)
    We could have had it all
    ( you're wish you never had met me)
    Rolling in the deep
    (tears are gonna fall, rolling in the deep)
    You had my heart...
    ( you're gonna wish you)...
    Inside of your hand (Never had met me)”
    But you played it
    You played it.
    You played it.
    You played it to the beat.

    生成词频统计,排序

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=""",.''!"?:"""
    for c in sep:
       news=news.replace(c," " )
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
          print(w,wordDict[w])
    

      排除语法型词汇,代词、冠词、连词

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
    

      输出词频最大TOP20

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    
    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
        print(dic[i])
    

      将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容

    f=open(‘rolling.txt‘,‘r‘)
    text=f.read()
    f.close()
    print(text)

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=',.?!’:'
    for c in sep:
       news=news.replace(c," ")
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)
    
  • 相关阅读:
    C#--跨线程更新UI--实时显示POST请求传过来的数据
    C#--序列化--JSON和对象互转方法
    C#--winform--Label标签的文字居中
    C#--自定义控件-panel控件(渐变色,文字的绘制)
    C#--自定义控件-开发LED指示灯控件(带闪烁效果)
    艾而特--ModbusTcp通讯测试
    C#--各种方法总结(静态,构造,析构,虚方法,重写方法,抽象,扩展)
    C#--特性的运用试验
    C#--特性基础
    C#--无法将lambda表达式转换为类型‘Delegate’,原因是它不是委托类型
  • 原文地址:https://www.cnblogs.com/tyx123/p/8657914.html
Copyright © 2011-2022 走看看