zoukankan      html  css  js  c++  java
  • 聊斋相关的分词,出现次数最高的20个

    import  jieba
     
    txt = open("聊斋志异白话简写版.txt", "r", encoding='utf-8').read()
    words = jieba.lcut(txt)     # 使用精确模式对文本进行分词
    counts = {}     # 通过键值对的形式存储词语及其出现的次数
     
    for word in words:
        if len(word) == 1:
            continue   
        elif word == "小倩" or word == "鬼妻":
            rword = "聂小倩"
        elif word == "采臣":
            rword = "唐僧"
        elif word == "黑山" or word=="万妖群魔之首":
            rword = "黑山老妖"
        elif word=="十四娘":
            rword="辛十四娘"
        elif word == "子楚":
            rword = "孙子楚"
        elif word=="赵阿宝":
            rword="阿宝"
        else:
            rword = word
        counts[rword] = counts.get(rword,0) + 1
             
    items = list(counts.items())#将键值对转换成列表
    items.sort(key=lambda x: x[1], reverse=True)    # 根据词语出现的次数进行从大到小排序
     
    for i in range(20):
        word, count = items[i]
        print("{0:<10}{1:>5}".format(word, count))

  • 相关阅读:
    code3728 联合权值
    Codevs 4600 [NOI2015]程序自动分析
    code1540 银河英雄传说
    code1074 食物链
    堆排序
    哈夫曼树与哈夫曼码
    优先队列用法
    code1154 能量项链
    code1225 八数码Bfs
    javascript5
  • 原文地址:https://www.cnblogs.com/sonder22/p/13975495.html
Copyright © 2011-2022 走看看