zoukankan      html  css  js  c++  java
  • 综合练习:英文词频统计

    # -*- coding:UTF-8 -*-
    # -*- author:deng -*-
    news = '''
    The only problem unconsciously assumed by all Chinese philosophers to be of
    any importance is:How shall we enjoy life, and who can best enjoy life? No
    perfectionism, no straining after the unattainable, no postulating of he
    unknowable; but taking poor, modal human nature as it is, how shall we
    organize our life so that we can woke peacefully, endure nobly and live happily?
    Who are we? That is first question. It is a question almost impossible to answer.
    But we all agree with the busy self occupied in our daily activities is not quite
    the real self. We are quite sure we have lost something in the mere pursuit of living.
    When we watch a person running about looking for something in a field, the wise man can
    set a puzzle for all the spectator to solve: what has that person lost? Some one thinks
    is a watch; another thinks it is a diamond brooch; and others will essay other guesses.
    After all the guesses have failed, the wise man who really doesn't know what the person
    is seeking after, tells the company:" I'll tell you. He has lost some breath." And no one
    can deny that he is right. So we often forget our true self in the pursuit of living, like
    a bird forgetting its own danger in pursuit of a mantis which again forgets its own danger
    in pursuit of another.
    '''
    # 将分隔符替换为空格
    symbol=[",",".","!","?",":","'"]
    for i in range(len(symbol)):
    news=news.replace(symbol[i]," ")
    print(news)
    wordList = news.lower().split()

    # 将所有大写转换为小写
    news = news.lower()
    print(news)

    # 生成单词列表
    news = news.split()
    print(news)


    # 生成词频统计

    dict = {}
    for w in wordList:
    dict[w] = dict.get(w,0)+1
    print(dict)

    # 排除语法型词汇,代词、冠词、连词
    word = ['the', 'by', 'to', 'be', 'of', 'and', 'with', 'not']
    for i in word:
    del dict[i]
    print(dict)

    # 输出词频top10
    dict1 = sorted(dict.items(), key=lambda x: x[1], reverse=True)
    for i in range(10):
    print(dict1[i])
  • 相关阅读:
    Dijksrta algorithm
    头一回发博客,来分享个有关C++类型萃取的编写技巧
    读书笔记「Python编程:从入门到实践」_4.操作列表
    读书笔记「Python编程:从入门到实践」_3.列表简介
    读书笔记「Python编程:从入门到实践」_2.变量和简单数据类型
    2017/01/20 学习笔记 关于修改和重打jar包
    2017/01/07 学习笔记 jar包,maven
    常用链接
    使用Dir,遍历文件夹下所有子文件夹及文件
    .NET WEB项目的调试发布相关
  • 原文地址:https://www.cnblogs.com/dfq621/p/8655087.html
Copyright © 2011-2022 走看看