zoukankan      html  css  js  c++  java
  • 6.中文词频统计

    import jieba


    f = open('sanguoyanyi.txt', 'r',encoding='utf-8')
    text = f.read()
    f.close()

    jieba.add_word('曹操')
    jieba.add_word('诸葛亮')
    jieba.add_word('孔明')
    punctuation = ''',。‘’“”:;()!?、 '''
    a = {'的',' ','u3000','曰','之','不','人','军','操','一','将',
    '大','马','来','德','有','于','下','兵','此',
    '玄','公','见','为','何','中','而','可','吾',
    '出','也','以','与','上','后','今','其','去',
    '日','明','言'}
    for i in punctuation:
    text = text.replace(i, '')
    print(list(jieba.cut(text)))
    tempwords = list(jieba.cut(text))
    print(tempwords)
    count = {}
    words = list(set(tempwords) - a)
    print(words)


    for i in range(0, len(words)):
    count[words[i]] = text.count(str(words[i]))


    countList = list(count.items())
    countList.sort(key=lambda x: x[1], reverse=True)
    print(countList)

    f = open('zzzCount.txt', 'a')
    for i in range(20):
    f.write(countList[i][0] + ':' + str(countList[i][1]) + ' ')
    f.close()

  • 相关阅读:
    supper 关键字
    self 关键字
    Setter/Getter方法
    0013.HBase进阶
    0012.HBase基础
    0011.MapReduce编程案例2
    0010.MapReduce编程案例1
    0009.Mapreduce的高级功能
    0008.MapReduce基础
    0007.HDFS上传与下载的原理
  • 原文地址:https://www.cnblogs.com/zhu573514187/p/8664235.html
Copyright © 2011-2022 走看看