zoukankan      html  css  js  c++  java
  • 综合练习:英文词频统计

    词频统计预处理
    下载一首英文的歌词或文章
    将所有,.?!’:等分隔符全部替换为空格
    将所有大写转换为小写
    生成单词列表
    生成词频统计
    排序
    排除语法型词汇,代词、冠词、连词
    输出词频最大TOP10

    article = '''The capital has opened 33 roads of a total length of 105 kilometers for autonomous1 car testing outside the Fifth Ring Road and away from densely-populated areas on the outskirts2.
    
    According to regulations for managing road testing for self-driving vehicles,autonomous vehicles are eligible3 for public road testing only after they have completed 5,000 kilometers of daily driving in designated closed test fields and passed assessments4.
    
    The test vehicles must be equipped with monitoring devices that can monitor driving behavior,collect vehicle location information and monitor whether a vehicle is in self-driving mode.
    
    Test drivers must have received no less than 50 hours of self-driving training.
    
    Beijing has built its first closed test fields in Haidian District, covering about 13 hectares.
    
    The licenses6 for road testing are valid7 for 30 days and license5 holders8 can apply for renewal9 after self-driving cars pass assessments.
    
    Baidu is developing high-resolution maps for self-driving cars. The first will be based on the 33 roads.'''
    
    
    
    
    sign = ['.',',']
    for i in sign:
        article = article.replace(i, '')
    
    article = article.lower()
    
    article = article.split()
    
    dic = dict(zip())
    
    for i in article:
        dic[i] = article.count(i)
    
    delwords = ["the", "has", "of", "a", "for", "and","away","from","on","to","are","only","have","in","after"]
    
    for i in delwords:
        del dic[i]
    
    newarticle = sorted(dic.items(),key=lambda s:s[1],reverse=True)
    
    for i in range(10):
        print(newarticle[i])

    运行结果截图:

  • 相关阅读:
    CocosIDE导出Android APK的注意事项
    C++14尝鲜:Generic Lambdas(泛型lambda)
    silverlight调用WebService传递json接收绑定数据
    解决考试系统高并发数据载入不对问题
    汇编入门学习笔记 (九)—— call和ret
    Java SerialPort SDK
    how tomcat works 总结 二
    linux下多线程的调试
    垃圾回收GC:.Net自己主动内存管理 上(二)内存算法
    HDU-4973-A simple simulation problem.(线段树)
  • 原文地址:https://www.cnblogs.com/xuyizhu/p/8618512.html
Copyright © 2011-2022 走看看