zoukankan      html  css  js  c++  java
  • 【Python】三国演义词频统计

    import jieba
    txt = open('C:/Users/eternal/Desktop/threekingdoms.txt','r',encoding='UTF-8').read()  #提前修改txt文件编码格式utf-8
    excludes = {'将军','却说','荆州','二人','不可','不能','如此'}  #错误的名字
    words = jieba.lcut(txt)
    print(words)
    counts = {}
    for word in words:
    if len(word) == 1:
    continue
    elif word == '诸葛亮' or word == '孔明曰':
    rword = '孔明'
    elif word == '关公' or word == '云长':
    rword == '关羽'
    elif word == '玄德' or word == '玄德曰':
    rword = '刘备'
    elif word == '孟德' or word == '丞相':
    rword = '曹操'
    else:
    rword = word
    counts[rword] = counts.get(rword,0) + 1
    for word in excludes:
    del counts[word]
    items = list(counts.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(items)
    for i in range(10):
    word,count = items[i]
    print('{0:<10}{1:>5}'.format(word,count))
  • 相关阅读:
    maven错误
    angularjs的一点总结
    工具汇总
    重启outlook的bat脚本
    前端框架参考
    imply套件以及plyql的安装
    centos下nodejs,npm的安装和nodejs的升级
    kafka错误集锦
    动态规划DP笔记
    链接
  • 原文地址:https://www.cnblogs.com/naraka/p/8985134.html
Copyright © 2011-2022 走看看