zoukankan      html  css  js  c++  java
  • 词频统计

    # utf-8
    # 打开英文演讲的txt文档
    s_words = {'own', 'the', 'and', 'that', 'this', 'it', 'my', 'when', 'but', 'so', 'where', 'an', 'a'}
    sep = ''',.?!:”'“;][ ’'''
    with open("dream.txt") as fd:
        words = fd.readlines()
        for i in sep:
            # print(i)
            words = str(words).replace(i, '
    ').lower()
    tr_words = words.lower().split()
    # for w in wordset:
    #     print(w)
    worddit = {}
    for j in tr_words:
        worddit[j] = tr_words.count(j)
    diclist = list(worddit.items())
    diclist.sort(key=lambda x: x[1], reverse=True)
    for d in diclist[0:10]:
        print(d)
    wordset = set(tr_words) - s_words
    tr_words = list(wordset)
    diclist = list(worddit.items())
    diclist.sort(key=lambda x: x[1])  #lamba定义了一个匿名函数,只对x[1]进行排序操作
    # for d in diclist[0:10]:
    #     print(d)
    
    fd.close()
    

      

  • 相关阅读:
    Java设计模式—单例模式
    Java集合框架
    Java进程和线程
    Java IO
    Java异常类
    Java面向对象—抽象类和接口
    Java面向对象—多态
    Java面向对象—继承
    Java面向对象
    Java基础语法
  • 原文地址:https://www.cnblogs.com/miranda-76/p/8653947.html
Copyright © 2011-2022 走看看