zoukankan      html  css  js  c++  java
  • 文件方式实现完整的英文词频统计实例

    可以下载一长篇的英文小说,进行词频的分析。

    1.读入待分析的字符串

    2.分解提取单词 

    3.计数字典

    4.排除语法型词汇

    5.排序

    6.输出TOP(20)

    7.对输出结果的简要说明。

    fo = open('C:/Uscer/ben/test.txt','r')
    #读入该分析的字符串 str = fo.read() fo.close() #将所有大写转换成小写 str=str.lower()
    #把,。转换为空格 for i in ',.': str=str.replace(i,'')
    #分隔出一个一个单词 words = str.split(' ')
    #排除语法型词汇 exp = {'','the','and','to','on','of','s','a','is','u','as','also'} dic={} keys = set(words)-exp
    #计数字典 for w in keys: dic[w] = words.count(w) #排序 wc = list(dic.items()) wc.sort(key= lambda x:x[1],reverse=True) for i in range(20): print(wc[i])

      test.txt:

    Canadian Prime Minister Justin Trudeau (central) and Jack Ma (right), executive chairman and founder of the Alibaba Group, attend the Alibaba Group's Gateway'17 Canada conference in Toronto on Sept 25, 2017. [Photo/Xinhua]

    The Alibaba Group's Gateway'17 Canada conference opened Monday in Canada's largest city Toronto.
    Jack Ma, executive chairman and founder of the Alibaba Group, and Canadian Prime Minister Justin Trudeau delivered key speeches at the conference, which was attended by more than 3,600 people, at the Toronto Exhibition Place.

    The event, along with a trade show, attracted a variety of organizations and businesses, covering such sectors as manufacturing, retail, professional services, agribusiness, and travel and tourism.

    Data showed 68 percent of the participants were from small businesses with fewer than 50 employees.

    The one-day conference featured presentations and breakout sessions aimed at educating enterprises about what and how to sell to China, especially through e-commerce platforms.

    For example, people learned about how Alibaba's online travel marketplace and payment solutions can help Canadian businesses serve the rapidly expanding outbound Chinese travel and tourism market.

    从输出结果中可知此文是关于加拿大的旅游与生意的

  • 相关阅读:
    C++网易云课堂开发工程师-操作符重载
    C++网易云课堂开发工程师-参数传递与返回值
    C++网易云课堂开发工程师-class的声明
    C++网易云课堂开发工程师-头文件与类声明
    线性代数的本质-08第二部分-以线性代数的眼光看叉积
    线性代数本质-08第一部分-叉积的标准介绍
    线性代数的本质-07-点积与对偶性
    线性代数的本质-06补充说明-非方阵
    线性代数的本质-06-逆矩阵、列空间与零空间
    cocos2d-x
  • 原文地址:https://www.cnblogs.com/0055sun/p/7602223.html
Copyright © 2011-2022 走看看