zoukankan html css js c++ java

文件方式实现完整的英文词频统计实例

题目：可以下载一长篇的英文小说，进行词频的分析。

1.读入待分析的字符串

2.分解提取单词

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

7.对输出结果的简要说明。

fo=open('article.txt','w')
#读入待分析的字符串 
news=fo.read()

fo.close()
#字符串处理
news.lower()          
for i in '.,:;?!-_':
    news.replace(i,' ')
#分解提取单词
news=news.split(' ')      #排除语法型词汇
exp={'the','of','and','to','a','in','at','for','with','an','has','that','will','should','is','its','he','have','on','each','during','as'}
word=set(news)-exp
#计数字典
dic={}                     
for i in word:
    dic[i]=news.count(i)
news=list(dic.items())
#排序
news.sort(key=lambda x:x[1],reverse=True)     
for i in range(20):
   print(news[i])

由结果可知，是关于G20峰会的内容，国家之间的开会，更好促进国与国之间交流。

查看全文

相关阅读:
POJ
归并排序+归并排序求逆序对（例题P1908）
HDU
2018-12-5 及 codeforces round 525v2
2018-12-1学习纪录
 近期总结和未来规划
 C++ storage allocation + Dynamic memory allocation + setting limits + initializer list (1)
注意项
 第四课计算机的基本组成
 第二课+第三课计算机系统概论

原文地址：https://www.cnblogs.com/1244581939cls/p/7602457.html