zoukankan      html  css  js  c++  java
  • python sorted() count() set(list)-去重 -- search + match

    2、用python实现统计一篇英文文章内每个单词的出现频率,并返回出现频率最高的前10个单词及其出现次数,并解答以下问题?(标点符号可忽略)

    (1) 创建文件对象f后,解释f的readlines和xreadlines方法的区别?

    (2) 追加需求:引号内元素需要算作一个单词,如何实现?

    cat /root/text.txt

    hello world 2018 xiaowei,good luck
    hello kitty 2017 wangleai,ha he
    hello kitty ,hasd he
    hello kitty ,hasaad hedsfds

    #我的脚本

    #!/usr/bin/python
    #get ['a','b','c']
    import re
    with open('/root/text.txt') as f:
      openfile = f.read()

    def get_list_dict():
      word_list = re.split('[0-9W]+',openfile)
      list_no_repeat = set(word_list)
      dict_word = {}
      for each_word in list_no_repeat:
        dict_word[each_word] = word_list.count(each_word)
      del dict_word['']
      return dict_word

    #{'a':2,'c':5,'b':1} => {'c':5,'a':2,'b':1}
    def sort_dict_get_ten(dict_word):
      list_after_sorted = sorted(dict_word.items(),key=lambda x:x[1],reverse=True)
      print list_after_sorted
      for i in range(3):
      print list_after_sorted[i][0],list_after_sorted[i][1]

    def main():

          dict_word = get_list_dict()
          sort_dict_get_ten(dict_word)

    if __name__ == '__main__':

       main()

    [('hello', 4), ('kitty', 3), ('he', 2), ('good', 1), ('hasd', 1), ('wangleai', 1), ('hasaad', 1), ('xiaowei', 1), ('hedsfds', 1), ('luck', 1), ('world', 1), ('ha', 1)]
    hello 4
    kitty 3
    he 2

  • 相关阅读:
    C#泛型
    C#接口
    C#委托和事件
    Unity Ray 射线
    C#学习基础
    Graph | Eulerian path
    Careercup | Chapter 8
    Leetcode | Pow(x, n)
    Leetcode | Gray code
    分布式缓存
  • 原文地址:https://www.cnblogs.com/hixiaowei/p/9122280.html
Copyright © 2011-2022 走看看