zoukankan      html  css  js  c++  java
  • 文件方式实现完整的英文词频统计实例

    1.读入待分析的字符串

    代码如下:

    fo=open('text.txt','w')
    fo.write('''Well I wonder could it be When I was dreaming about you baby You were dreaming of me Call me crazy Call me blind To still be suffering is stupid after all of this time Did I lose my love to someone better And does she love you like I do I do, you know I really really do Well hey So much I need to say Been lonely since the day The day you went away So sad but true For me there's only you Been crying since the day I remember date and time September twenty second Sunday twenty five after nine In the doorway with your case No longer shouting at each other There were tears on our faces And we were letting go of something special Something we'll never have again I know, I guess I really really know Why do we never know what we've got till it's gone How could I carry on Cause I've been missing you so much I have to say'''
    )
    fo=open('text.txt','r')
    day=fo.read()

    结果:

    2.分解提取单词 

    代码如下:

    day=day.lower()
    
    
    for i in ',."?':
        day=day.replace(i,' ')
    
    words=day.split(' ')
    #print(words)

    运行结果:

    3.计数字典

    代码如下:

    dict={}
    keys=set(words)
    print(keys)
    for i in keys:
        
        dict[i]=words.count(i)
    print(dict)

    运行结果:

    4.排除语法型词汇

    代码如下:

    exc={'i','you','to','me','the','been','of','so','and','were','','on','really'}
    
    dict={}
    keys=set(words)
    keys=keys-exc
    print(keys)
    for i in keys:
        
        dict[i]=words.count(i)
    print(dict)

    运行结果:

    5.排序

    代码如下:

    wc=list(dict.items())
    wc.sort(key=lambda x:x[1],reverse=True)
    print(wc)

    运行结果:

    6.输出TOP(20)

     代码如下:

    for i in range(20):
        print(wc[i])

    总代码如下:


    fo=open('text.txt','w')
    fo.write('''Well I wonder could it be When I was dreaming about you baby You were dreaming of me Call me crazy Call me blind To still be suffering is stupid after all of this time Did I lose my love to someone better And does she love you like I do I do, you know I really really do Well hey So much I need to say Been lonely since the day The day you went away So sad but true For me there's only you Been crying since the day I remember date and time September twenty second Sunday twenty five after nine In the doorway with your case No longer shouting at each other There were tears on our faces And we were letting go of something special Something we'll never have again I know, I guess I really really know Why do we never know what we've got till it's gone How could I carry on Cause I've been missing you so much I have to say'''
    )
    fo=open('text.txt','r')
    day=fo.read()
    day=day.lower()


    for i in ',."?':
    day=day.replace(i,' ')

    words=day.split(' ')
    #print(words)


    exc={'i','you','to','me','the','been','of','so','and','were','','on','really'}

    dict={}
    keys=set(words)
    keys=keys-exc
    #print(keys)
    for i in keys:

    dict[i]=words.count(i)
    #print(dict)

    wc=list(dict.items())
    wc.sort(key=lambda x:x[1],reverse=True)
    print(wc)

    for i in range(20):
    print(wc[i])

    运行结果:

  • 相关阅读:
    java+selenium+testNG+excel 实现 web 网页的自动化测试
    LoadRunner测试下载功能点脚本(方法一)
    学习ajax总结
    多行文本溢出 省略号显示
    angular 中表单验证的探索
    关于ngModelOptions用法总结 让校验不过的验证绑定ngModel
    柯里化学习
    call、aply、bind的常用方法总结
    为什么我获取不到这个css样式?js原生获取css样式总结
    文本溢出省略号显示时,水平位置发生偏移
  • 原文地址:https://www.cnblogs.com/decadeyu/p/7595174.html
Copyright © 2011-2022 走看看