zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    一、词频统计预处理

    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    Finally,
    I can see you crystal clear.
    Go ahead and sell me out and I'll lay your ship bare.
    See how I leave, with every piece of you
    Don't underestimate the things that I will do.
    There's a fire starting in my heart,
    Reaching a fever pitch and it's bringing me out the dark
    The scars of your love, remind me of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    We could have had it all..
    (you're gonna wish you, never had met me)...
    Rolling in the Deep (Tears are gonna fall, rolling in the deep)
    Your had my heart... (you're gonna wish you)...
    Inside of your hand (Never had met me)
    And you played it... (Tears are gonna fall)...
    To the beat (Rolling in the deep)
    Baby I have no story to be told,
    But I've heard one of you and I'm gonna make your head burn.
    Think of me in the depths of your despair.
    Making a home down there,
    as mine sure won't be shared.
    The scars of your love,
    remind you of us.
    They keep me thinking that we almost had it all
    The scars of your love,
    they leave me breathless
    I can't help feeling...
    “We could have had it all...
    (you're gonna wish you never had met me)...
    Rolling in the Deep
    (Tears are gonna fall,
    rolling in the deep)
    Your had my heart...
    (you're gonna wish you)...
    inside of your hand
    (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    To the beat
    (Rolling in the deep)
    Could have had it all
    Rolling in the deep.
    You had my heart inside of your hand,
    But you played it with your beating”
    Throw yourself through ever open door (Whoa)
    Count your blessings to find what look for (Whoa-uh)
    Turn my sorrow into treasured gold (Whoa)
    And pay me back in kind- You reap just what you sow.
    “(You're gonna wish you... Never had met me)
    We could have had it all
    (Tears are gonna fall... Rolling in the deep)
    We could have had it all yeah
    ( you're gonna wish you... never had met me)
    It all.
    (Tears are gonna fall)
    It all
    It all
    (Rolling in the deep)
    We could have had it all
    (you're gonna wish you, never had met me)
    Rolling in the deep
    (Tears are gonna fall rolling in the deep)
    You had my heart inside...
    (you're gonna wish you)... of your hand (Never had met me)
    And you played it...
    (Tears are gonna fall)...
    to the beat (Rolling in the deep)
    We could have had it all
    ( you're wish you never had met me)
    Rolling in the deep
    (tears are gonna fall, rolling in the deep)
    You had my heart...
    ( you're gonna wish you)...
    Inside of your hand (Never had met me)”
    But you played it
    You played it.
    You played it.
    You played it to the beat.

    生成词频统计,排序

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=""",.''!"?:"""
    for c in sep:
       news=news.replace(c," " )
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
          print(w,wordDict[w])
    

      排除语法型词汇,代词、冠词、连词

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])
    

      输出词频最大TOP20

    f=open(‘rolling.txt‘,‘r‘)
    news=f.read()
    f.close()
    sep=‘‘‘,.‘!"?:‘‘‘
    exclude={‘be‘,‘i‘,‘so‘,‘over‘,‘hearing‘}
    for c in sep:
       news=news.replace(c,‘ ‘)
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    
    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
        print(dic[i])
    

      将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容

    f=open(‘rolling.txt‘,‘r‘)
    text=f.read()
    f.close()
    print(text)

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open("rolling.txt","r")
    news=f.read()
    f.close()
    sep=',.?!’:'
    for c in sep:
       news=news.replace(c," ")
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)
    
  • 相关阅读:
    Java实现 LeetCode 32 最长有效括号
    Java实现 LeetCode 31下一个排列
    Java实现 LeetCode 31下一个排列
    Java实现 LeetCode 31下一个排列
    Java实现 蓝桥杯 素因子去重
    Java实现 蓝桥杯 素因子去重
    Java实现 蓝桥杯 素因子去重
    Java实现 LeetCode 30 串联所有单词的子串
    Visual c++例子,可不使用常规的对话框资源模板的情况下,动态创建对话框的方法
    MFC不使用对话框资源模版创建对话框
  • 原文地址:https://www.cnblogs.com/tyx123/p/8657914.html
Copyright © 2011-2022 走看看