zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    一、词频统计预处理

    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    Finally,
    I can see you crystal clear.
    Go ahead and sell me out and I'll lay your ship bare.
    See how I leave, with every piece of you
    Don't underestimate the things that I will do.
    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    The scars of your love, remind me of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    We could have had it all..
    (you're gonna wish you, never had met me)...
    Rolling in the Deep (Tears are gonna fall, rolling in the deep)
    Your had my heart... (you're gonna wish you)...
    Inside of your hand (Never had met me)
    And you played it... (Tears are gonna fall)...
    To the beat (Rolling in the deep)
    Baby I have no story to be told,
    But I've heard one of you and I'm gonna make your head burn.
    Think of me in the depths of your despair.
    Making a home down there,
    as mine sure won't be shared.
    The scars of your love,
    remind you of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    “We could have had it all...
    (you're gonna wish you never had met me)...
    Rolling in the Deep
    (Tears are gonna fall,
    rolling in the deep)
    Your had my heart...
    (you're gonna wish you)...
    inside of your hand
    (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    To the beat
    (Rolling in the deep)
    Could have had it all
    Rolling in the deep.
    You had my heart inside of your hand,
    But you played it with your beating”
    Throw yourself through ever open door (Whoa)
    Count your blessings to find what look for (Whoa-uh)
    Turn my sorrow into treasured gold (Whoa)
    And pay me back in kind- You reap just what you sow.
    “(You're gonna wish you... Never had met me)
    We could have had it all
    (Tears are gonna fall... Rolling in the deep)
    We could have had it all yeah
    ( you're gonna wish you... never had met me)
    It all.
    (Tears are gonna fall)
    It all
    It all
    (Rolling in the deep)
    We could have had it all
    (you're gonna wish you, never had met me)
    Rolling in the deep
    (Tears are gonna fall rolling in the deep)
    You had my heart inside...
    (you're gonna wish you)... of your hand (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    to the beat (Rolling in the deep)
    We could have had it all
    ( you're wish you never had met me)
    Rolling in the deep
    (tears are gonna fall, rolling in the deep)
    You had my heart...
    ( you're gonna wish you)...
    Inside of your hand (Never had met me)”
    But you played it
    You played it.
    You played it.
    You played it to the beat.

    生成词频统计,排序

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=""",.''!"?:"""
    for c in sep:
       news=news.replace(c," " )
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
          print(w,wordDict[w])
    

      排除语法型词汇,代词、冠词、连词

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
    

      输出词频最大TOP20

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    
    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
        print(dic[i])
    

      将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容

    f=open(‘rolling.txt‘,‘r‘)
    text=f.read()
    f.close()
    print(text)

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=',.?!’:'
    for c in sep:
       news=news.replace(c," ")
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)
    
  • 相关阅读:
    webyestem(伊莱博)票据管理(ver1.0)数据库设计
    MicrosoftNorthwind(电子商务)数据库设计
    WebForm三层架构
    WebForm带接口工厂模式的三层架构
    VS 2008 C#代码调试进C++代码设置/远程调试
    C# 中使用指针
    互操作性——使用C/C++类型的非托管函数基础
    Perforce使用指南_forP4V
    [转]DotNet程序之找BUG心得
    C# 对XML基本操作总结
  • 原文地址:https://www.cnblogs.com/tyx123/p/8657914.html
Copyright © 2011-2022 走看看