zoukankan      html  css  js  c++  java
  • 词频统计

    # utf-8
    # 打开英文演讲的txt文档
    s_words = {'own', 'the', 'and', 'that', 'this', 'it', 'my', 'when', 'but', 'so', 'where', 'an', 'a'}
    sep = ''',.?!:”'“;][ ’'''
    with open("dream.txt") as fd:
        words = fd.readlines()
        for i in sep:
            # print(i)
            words = str(words).replace(i, '
    ').lower()
    tr_words = words.lower().split()
    # for w in wordset:
    #     print(w)
    worddit = {}
    for j in tr_words:
        worddit[j] = tr_words.count(j)
    diclist = list(worddit.items())
    diclist.sort(key=lambda x: x[1], reverse=True)
    for d in diclist[0:10]:
        print(d)
    wordset = set(tr_words) - s_words
    tr_words = list(wordset)
    diclist = list(worddit.items())
    diclist.sort(key=lambda x: x[1])  #lamba定义了一个匿名函数,只对x[1]进行排序操作
    # for d in diclist[0:10]:
    #     print(d)
    
    fd.close()
    

      

  • 相关阅读:
    maven私服
    docker
    mysql ip
    jenkins安装
    centeros7防火墙操作
    centeros7 gitlap安装
    nexus安装及使用(maven私服掌握)
    idea永久激活使用
    redis密码验证
    Nginx服务优化配置
  • 原文地址:https://www.cnblogs.com/miranda-76/p/8653947.html
Copyright © 2011-2022 走看看