zoukankan      html  css  js  c++  java
  • python 利用jieba库词频统计

     1 #统计《三国志》里人物的出现次数
     2 
     3 import jieba
     4 text = open('threekingdoms.txt','r',encoding='utf-8').read()
     5 excludes = {'将军','却说','二人','不能','如此','荆州','不可','商议','如何','军士','左右','主公','引兵','次日','大喜','军马',
     6 '天下','东吴','于是'}
     7 #返回列表类型的分词结果
     8 words = jieba.lcut(text)
     9 #通过字典映射,统计次数
    10 counts = {}
    11 for word in words:
    12     if len(word) == 1:
    13         continue
    14     elif word == '孔明曰' or word == '孔明':
    15         rword = '诸葛亮'
    16     elif word == '关公' or word == '云长':
    17         rword = '关羽'
    18     elif word == '玄德' or word == '玄德曰':
    19         rword = '刘备'
    20     elif word == '孟德' or word == '丞相':
    21         rword = '曹操'
    22     else:
    23         rword = word
    24     counts[rword] = counts.get(rword,0) + 1
    25 for word in excludes:
    26     del counts[word]
    27 items = list(counts.items())
    28 #排序,从大到小
    29 items.sort(key=lambda x:x[1],reverse=True)
    30 for i in range(5):
    31     word,count = items[i]
    32     print('{0:<10}{1:>5}'.format(word,count))
  • 相关阅读:
    数据库设计
    构建评价
    Schema xds文献
    架构设计评价
    需求分析评价
    获取script的链接参数并执行
    js获取封装对象/通过id tag className
    通过css/js来固定div的位置
    nginx日志分析工具goaccesss
    如何快速安装 allure
  • 原文地址:https://www.cnblogs.com/sneike/p/9302218.html
Copyright © 2011-2022 走看看