zoukankan      html  css  js  c++  java
  • 如何统计序列中元素的频度?

    需求:
    1、某随机序列中的[12,5,6,4,6,8,9,5]如何找到其中出现次数最高的三个元素,出现的次数是多少?
    2、对某篇英文文章进行词频统计,找出出现次数最高的十个单词,它们出现的次数是多少?
    思路:
    1、将序列转换成字典{'元素':'频度'},根据其中的值进行排序
    2、使用collections中的Count对象

    代码:

    # 方法一:
    In [16]: from random import randint
    
    In [17]: data = [ randint(0,20) for _ in range(30)]
    
    In [18]: data
    Out[18]: 
    [10,
     17,
     1,
     17,
     6,
     17,
     12,
     19,
     13,
     5,
     18,
     19,
     14,
     8,
     12,
     19,
     9,
     9,
     11,
     1,
     0,
     4,
     6,
     0,
     4,
     9,
     17,
     2,
     8,
     4]
    
    In [19]: d = dict.fromkeys(data,0)
    
    In [20]: d
    Out[20]: 
    {10: 0,
     17: 0,
     1: 0,
     6: 0,
     12: 0,
     19: 0,
     13: 0,
     5: 0,
     18: 0,
     14: 0,
     8: 0,
     9: 0,
     11: 0,
     0: 0,
     4: 0,
     2: 0}
    
    In [21]: for x in data:
        ...:     d[x] += 1
        ...: 
    
    In [22]: d
    Out[22]: 
    {10: 1,
     17: 4,
     1: 2,
     6: 2,
     12: 2,
     19: 3,
     13: 1,
     5: 1,
     18: 1,
     14: 1,
     8: 2,
     9: 3,
     11: 1,
     0: 2,
     4: 3,
     2: 1}
    
    In [23]: sorted([(v,k) for k,v in d.items()],reverse=True)
    Out[23]: 
    [(4, 17),
     (3, 19),
     (3, 9),
     (3, 4),
     (2, 12),
     (2, 8),
     (2, 6),
     (2, 1),
     (2, 0),
     (1, 18),
     (1, 14),
     (1, 13),
     (1, 11),
     (1, 10),
     (1, 5),
     (1, 2)]
    
    In [24]: sorted([(v,k) for k,v in d.items()],reverse=True)[:3]
    Out[24]: [(4, 17), (3, 19), (3, 9)]
    
    In [25]: sorted(((v,k) for k,v in d.items()),reverse=True)[:3] # 使用生成器解析更加节省空间
    Out[25]: [(4, 17), (3, 19), (3, 9)]
    
    In [26]: import heapq
    
    In [27]: heapq.nlargest(3,((v,k) for k,v in d.items()))
    Out[27]: [(4, 17), (3, 19), (3, 9)]
    
    # 方法二:
    In [28]: from collections import Counter
    
    In [29]: data
    Out[29]: 
    [10,
     17,
     1,
     17,
     6,
     17,
     12,
     19,
     13,
     5,
     18,
     19,
     14,
     8,
     12,
     19,
     9,
     9,
     11,
     1,
     0,
     4,
     6,
     0,
     4,
     9,
     17,
     2,
     8,
     4]
    
    In [30]: Counter(data)
    Out[30]: 
    Counter({10: 1,
             17: 4,
             1: 2,
             6: 2,
             12: 2,
             19: 3,
             13: 1,
             5: 1,
             18: 1,
             14: 1,
             8: 2,
             9: 3,
             11: 1,
             0: 2,
             4: 3,
             2: 1})
    
    In [31]: c = Counter(data)
    
    In [32]: c.most_common(3)
    Out[32]: [(17, 4), (19, 3), (9, 3)]
    
    # 统计词数:
    >>> txt = open('./example.txt').read()
    
    >>> import re
    >>> word_list = re.split('W+',txt)  # 以不止一个非单词字符进行切割
    
    >>> from collections import Counter
    
    >>> c2 = Counter(word_list)
    
    >>> c2.most_common(10)
    [('the', 18),
     ('to', 17),
     ('a', 16),
     ('it', 12),
     ('of', 12),
     ('and', 11),
     ('is', 11),
     ('that', 8),
     ('we', 8),
     ('enough', 7)]
    
    >>> 
    
  • 相关阅读:
    JAVA变量的作用域
    SQLite
    ajax
    浏览器调试
    SQL链接
    Computer
    Sql知识点总结
    Web Socket
    秒杀
    副业
  • 原文地址:https://www.cnblogs.com/Richardo-M-Q/p/13901484.html
Copyright © 2011-2022 走看看