zoukankan      html  css  js  c++  java
  • 【Python】三国演义词频统计

    import jieba
    txt = open('C:/Users/eternal/Desktop/threekingdoms.txt','r',encoding='UTF-8').read()  #提前修改txt文件编码格式utf-8
    excludes = {'将军','却说','荆州','二人','不可','不能','如此'}  #错误的名字
    words = jieba.lcut(txt)
    print(words)
    counts = {}
    for word in words:
    if len(word) == 1:
    continue
    elif word == '诸葛亮' or word == '孔明曰':
    rword = '孔明'
    elif word == '关公' or word == '云长':
    rword == '关羽'
    elif word == '玄德' or word == '玄德曰':
    rword = '刘备'
    elif word == '孟德' or word == '丞相':
    rword = '曹操'
    else:
    rword = word
    counts[rword] = counts.get(rword,0) + 1
    for word in excludes:
    del counts[word]
    items = list(counts.items())
    items.sort(key=lambda x:x[1],reverse=True)
    print(items)
    for i in range(10):
    word,count = items[i]
    print('{0:<10}{1:>5}'.format(word,count))
  • 相关阅读:
    yolo2 anchor选择校招总结
    rfcn校招总结
    cascade rcnn
    sort论文和代码解读
    重要的观点
    代办项
    stixel上边缘
    resnet densenet
    最小二乘法和线性回归
    逻辑回归原理
  • 原文地址:https://www.cnblogs.com/naraka/p/8985134.html
Copyright © 2011-2022 走看看