- 词频统计预处理
- 下载一首英文的歌词或文章
- 将所有,.?!’:等分隔符全部替换为空格
- 将所有大写转换为小写
- 生成单词列表
- 生成词频统计
- 排序
- 排除语法型词汇,代词、冠词、连词
- 输出词频最大TOP10
代码:
# -*- coding:utf-8 -*- f = open('song.txt', 'r') song = f.read() f.close() symbol = ''',.?!’:"“”-%$''' exclude = ''' a an the in on to at and of is was are were i he she you your they us their our it or for be too do no that s so as but it's ''' for i in symbol: song = song.replace(i, ' ') songList = song.lower().split() prep = exclude.split() excludeSet = set(prep) songDict = {} songSet = set(songList) - excludeSet for i in songSet: songDict[i] = songList.count(i) dictList = list(songDict.items()) dictList.sort(key=lambda item: item[1], reverse=True) for i in range(10): print(dictList[i])
输出结果:
('regulatory', 7)
('commission', 6)
('insurance', 5)
('financial', 5)
('bank', 5)
('banking', 5)
('china', 5)
('newly', 4)
('said', 4)
('central', 4)