zoukankan      html  css  js  c++  java
  • Python模块之collections

    一、概述

    在collections的源码中,可以看到:

    '''This module implements specialized container datatypes providing
    alternatives to Python's general purpose built-in containers, dict,
    list, set, and tuple.
    
    * namedtuple   factory function for creating tuple subclasses with named fields
    * deque        list-like container with fast appends and pops on either end
    * ChainMap     dict-like class for creating a single view of multiple mappings
    * Counter      dict subclass for counting hashable objects
    * OrderedDict  dict subclass that remembers the order entries were added
    * defaultdict  dict subclass that calls a factory function to supply missing values
    * UserDict     wrapper around dictionary objects for easier dict subclassing
    * UserList     wrapper around list objects for easier list subclassing
    * UserString   wrapper around string objects for easier string subclassing
    
    '''
    
    __all__ = ['deque', 'defaultdict', 'namedtuple', 'UserDict', 'UserList',
                'UserString', 'Counter', 'OrderedDict', 'ChainMap']

    这也就说明collections模块包含以下内容:

    • deque
    • defaultdict
    • namedtuple
    • UserDict
    • UserList
    • UserString
    • Counter
    • OrderDict
    • ChainMap

    二、namedTuple

    (一)Tuple

    namedTuple是Tuple的子类,所以Tuple有的特性,namedTuple都存在,那么Tuple有什么特性呢?

    1、不可变类型

    Tuple是不可变的数据类型:

    >>> user_tuple = ("zhangsan",30) #创建Tuple对象

    一旦创建不可更改,比如做如下的更改操作:

    >>> user_tuple[1] = 32

    就会报错:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment

    但是,Tuple的不可变也不是绝对的,我们看到Tuple内部的元素都是不可变的,如果改变内部可变的数据类型是没有问题的:

    >>> user_tuple = ("zhangsan",30,["reading","movies"])
    >>> user_tuple[2].append("animals")
    >>> user_tuple
    ('zhangsan', 30, ['reading', 'movies', 'animals'])
    >>>

    2、可迭代对象

    Tuple像列表等数据类型一样,是可迭代对象,自然拥有循环取值、切片这些特性:

    #for循环迭代取值
    user_tuple = ("zhangsan",30)
    for el in user_tuple:
        print(el)

    3、拆包

    number_Tuple = (1,2,3,4)
    
    first,*others = number_Tuple
    print(first,others) #1 [2, 3, 4]

    4、作为字典的key

    字典的key都是不可变类型,也就是说必须是可哈希的:

    condition = ("a","b")
    filter_dict = {}
    filter_dict[condition] = "result"
    print(filter_dict) #{('a', 'b'): 'result'}

    (二)namedTuple

    1、类创建

    我们一般创建类是这样来创建的:

    class User:
    
        def __init__(self,username,password):
            self.username = username
            self.password = password
    
    user = User("zhangsan",123456)
    print(user.username,user.password) #zhangsan 123456

    但是使用namedTuple可以更简单的创建:

    User = namedtuple("User",["username","password"])
    user = User("zhangsan",123456)
    print(user.username,user.password) #zhangsan 123456

    至于参数传递实际上与class类中传递是一样的,可以通过*args,**kwargs。

    #Tuple传值
    User = namedtuple("User",["username","password"])
    args = ("zhangsan",123456)
    user = User(*args)
    print(user.username,user.password) #zhangsan 123456
    
    #Dict传值
    User = namedtuple("User",["username","password"])
    kwargs = {"username":"zhangsan","password":123456}
    user = User(**kwargs)
    print(user.username,user.password) #zhangsan 123456
    View Code

    2、_make

    在上面的传值中,我们使用**args或者**kwargs来进行传值,那么通过_make方法可以更简单的进行传值:

    from collections import namedtuple
     #define class
    User = namedtuple("User",["username","password"])
    
    #define parameters
    parameters_list = ["zhangsan",123456]
    parameters_tuple = ("zhangsan",123456)
    parameters_dict = {"username":"zhangsan","password":123456}
    
    #init object
    user = User._make(parameters_list)
    user1 = User._make(parameters_tuple)
    user2 = User._make(parameters_dict)
    
    #output
    print(user.username,user.password) #zhangsan 123456
    print(user1.username,user.password) #zhangsan 123456
    print(user2.username,user.password) #zhangsan 123456

    可以看到在_make方法中只需要传递可迭代对象的参数即可。

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new {typename} object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != {num_fields:d}:
            raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
        return result
    _make

    3、_asdict

    该方法可以输出OrderDict类型的结果,将字典进行排序后输出。

    from collections import namedtuple
    
    User = namedtuple("User",["username","password"])
    kwargs = {"username":"zhangsan"}
    user = User(**kwargs,password=123456)
    print(user) #User(username='zhangsan', password=123456)
    
    user_dict = user._asdict()
    print(user_dict) #OrderedDict([('username', 'zhangsan'), ('password', 123456)])

    三、defaultdict

    defaultdict是内置dict的子类,也就是说dict有的特性它都有,另外在源码中:

    class defaultdict(dict):
    
        def __init__(self, default_factory=None, **kwargs): # known case of _collections.defaultdict.__init__
            """
            defaultdict(default_factory[, ...]) --> dict with default factory
            
            The default factory is called without arguments to produce
            a new value when a key is not present, in __getitem__ only.
            A defaultdict compares equal to a dict with the same items.
            All remaining arguments are treated the same as if they were
            passed to the dict constructor, including keyword arguments.
            
            # (copied from class doc)
            """
            pass

    从这里可以知道有一个参数是default_factory函数,它是在当dict中的key不存在时,会被给予给默认值。假如现在有这样一个实例:

    s = ['yellow', 'blue','yellow', 'blue','red']

    统计s列表中每个元素出现的个数,我们可能更多的使用如下的方式来实现:

    from collections import defaultdict
    
    s = ['yellow', 'blue','yellow', 'blue','red']
    
    count_dict = {}
    for i in s:
        if i not in count_dict:
            count_dict[i] = 1
        else:
            count_dict[i] += 1
    
    print(count_dict) #{'yellow': 2, 'blue': 2, 'red': 1}

    使用defaultdict可以更容易的来实现上述过程:

    from collections import defaultdict
    
    s = ['yellow', 'blue','yellow', 'blue','red']
    
    d = defaultdict(int) #key值不存在就会使用int类型的默认值,默认为0,相当于{“yellow”:0,"blue":0,"red":0}
    for i in s:
        d[i] += 1
    print(d) #defaultdict(<class 'int'>, {'red': 1, 'yellow': 2, 'blue': 2})

    另外可以使用其构造更为复杂的数据结构:

    from collections import defaultdict
    
    def gen_default():
        return {
            "username":"",
            "age":0
        }
    
    d = defaultdict(gen_default)
    d["g1"]  #g1键值不存在会生成默认的数据结构{"g1":{"username":"","age":0}}

    四、deque

    (一)deque初始化

    先看源码:

    class deque(object):
    
        def __init__(self, iterable=(), maxlen=None): # known case of _collections.deque.__init__
            """
            deque([iterable[, maxlen]]) --> deque object
            
            A list-like sequence optimized for data accesses near its endpoints.
            # (copied from class doc)
            """
            pass

    初始化一个双端队列的话,需要传入一个可迭代对象。

    from collections import deque
    d = deque(["a","b","c"])
    print(d) #deque(['a', 'b', 'c'])

    当然,也可以传入元祖和字典(得到的是key值)。

    (二)deque方法

    deque中有很多方法:

    class deque(object):
        """
        deque([iterable[, maxlen]]) --> deque object
        
        A list-like sequence optimized for data accesses near its endpoints.
        """
        def append(self, *args, **kwargs): # real signature unknown
            """ Add an element to the right side of the deque. """
            pass
    
        def appendleft(self, *args, **kwargs): # real signature unknown
            """ Add an element to the left side of the deque. """
            pass
    
        def clear(self, *args, **kwargs): # real signature unknown
            """ Remove all elements from the deque. """
            pass
    
        def copy(self, *args, **kwargs): # real signature unknown
            """ Return a shallow copy of a deque. """
            pass
    
        def count(self, value): # real signature unknown; restored from __doc__
            """ D.count(value) -> integer -- return number of occurrences of value """
            return 0
    
        def extend(self, *args, **kwargs): # real signature unknown
            """ Extend the right side of the deque with elements from the iterable """
            pass
    
        def extendleft(self, *args, **kwargs): # real signature unknown
            """ Extend the left side of the deque with elements from the iterable """
            pass
    
        def index(self, value, start=None, stop=None): # real signature unknown; restored from __doc__
            """
            D.index(value, [start, [stop]]) -> integer -- return first index of value.
            Raises ValueError if the value is not present.
            """
            return 0
    
        def insert(self, index, p_object): # real signature unknown; restored from __doc__
            """ D.insert(index, object) -- insert object before index """
            pass
    
        def pop(self, *args, **kwargs): # real signature unknown
            """ Remove and return the rightmost element. """
            pass
    
        def popleft(self, *args, **kwargs): # real signature unknown
            """ Remove and return the leftmost element. """
            pass
    
        def remove(self, value): # real signature unknown; restored from __doc__
            """ D.remove(value) -- remove first occurrence of value. """
            pass
    
        def reverse(self): # real signature unknown; restored from __doc__
            """ D.reverse() -- reverse *IN PLACE* """
            pass
    
        def rotate(self, *args, **kwargs): # real signature unknown
            """ Rotate the deque n steps to the right (default n=1).  If n is negative, rotates left. """
            pass
    源码中的方法

    1、pop

    from collections import deque
    d = deque(["a","b","c"])
    print(d.pop()) #c
    print(d) #deque(['a', 'b'])

    2、popleft

    from collections import deque
    d = deque(["a","b","c"])
    print(d.popleft()) #a
    print(d) #deque(['b', 'c'])

    3、append

    from collections import deque
    d = deque(["a","b","c"])
    d.append("d") 
    print(d) #deque(['a', 'b', 'c', 'd'])

    4、appendleft

    from collections import deque
    d = deque(["a","b","c"])
    d.appendleft("d")
    print(d) #deque(['d', 'a', 'b', 'c'])

    5、extend

    from collections import deque
    d1 = deque(["a","b","c"])
    d2 = deque(["d","e","f"])
    
    d1.extend(d2)
    print(d1) #deque(['a', 'b', 'c', 'd', 'e', 'f'])

    注意:extend没有返回值,d1调用extend就是对d1的扩展。

    6、insert

    from collections import deque
    d = deque(["a","b","c"])
    d.insert(1,"d")
    print(d) #deque(['a', 'd', 'b', 'c'])

    7、reverse

    from collections import deque
    d = deque(["a","b","c"])
    d.reverse()
    print(d) #deque(['c', 'b', 'a'])

    8、copy

    from collections import deque
    d1 = deque(["a","b","c"])
    d2 = d1.copy()
    
    #id不同证明是不同的变量
    print(id(d1)) #173766656
    print(id(d2)) #173766864
    
    #拷贝之后操作d1对d2没影响
    d1.insert(2,"d")
    print(d1) #deque(['a', 'b', 'd', 'c'])
    print(d2) #deque(['a', 'b', 'c'])
    
    #如果d1中有可变元素
    d3 = deque(["a","b",["c","d"]])
    d4 = d3.copy()
    print(id(d3)) #173570256
    print(id(d4)) #173570360
    #操作可变元素,也就是说虽然d3和d4是不同的变量了,但是对于内部的可变元素是指引,不可变元素才是真正的拷贝互不影响
    d3[2].append("e")
    print(d3) #deque(['a', 'b', ['c', 'd', 'e']])
    print(d4) #deque(['a', 'b', ['c', 'd', 'e']])

    还有很多方法,其余的可以参考源码进行学习。

    五、Counter 

      Counter类是Python内置dict的一个子类,也就是说dict有的特性它都有,它主要是用来进行数据统计的。它是一个无序集合,其中元素被存储为字典的键,计数被存储为字典的值。计数可以被允许是整数、零或者负数。

    (一)统计个数

    可以向Counter类中传递可迭代对象,比如:字符串、列表:

    1、字符串统计

    from collections import Counter
    
    counter1 = Counter("ABCDAD")
    print(counter1) #Counter({'D': 2, 'A': 2, 'C': 1, 'B': 1})
    """
    因为Counter返回的是一个字典(是dict的子类),所以可以有字典的方法
    """
    counter2 = Counter("DEFABC")
    counter1.update(counter2)
    print(counter1) #Counter({'A': 3, 'D': 3, 'C': 2, 'B': 2, 'F': 1, 'E': 1})

    2、列表统计

    from collections import Counter
    
    counter1 = Counter(["A", "B", "C", "D", "A", "D"])
    print(counter1) #Counter({'D': 2, 'A': 2, 'C': 1, 'B': 1})
    """
    因为Counter返回的是一个字典(是dict的子类),所以可以有字典的方法
    """
    counter2 = Counter(["D", "E", "F", "A", "B", "C"])
    counter1.update(counter2)
    print(counter1) #Counter({'A': 3, 'D': 3, 'C': 2, 'B': 2, 'F': 1, 'E': 1})

    (二)TopN问题

    在Counter类中有一个most_common方法返回的是个数最多的前几项。

    from collections import Counter
    
    top3 = Counter('abcdeabcdabcaba').most_common(3)
    print(top3) #[('a', 5), ('b', 4), ('c', 3)]

    源码:

    class Counter(dict):
    
        def most_common(self, n=None):
            '''List the n most common elements and their counts from the most
            common to the least.  If n is None, then list all element counts.
    
            >>> Counter('abcdeabcdabcaba').most_common(3)
            [('a', 5), ('b', 4), ('c', 3)]
    
            '''
            # Emulate Bag.sortedByCount from Smalltalk
            if n is None:
                return sorted(self.items(), key=_itemgetter(1), reverse=True)
            return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
    View Code

    (三)其它方法

    1、elements

    #迭代器遍历每个元素的次数与它的计数相同
    c = Counter("ABCDAD")
    print(sorted(c.elements())) #['A', 'A', 'B', 'C', 'D', 'D']

    2、subtract

    元素从一个可迭代的或从另一个映射(或计数器)中减去。

    from collections import Counter
    
    c = Counter(a=4, b=2, c=0, d=-2)
    d = Counter(a=1, b=2, c=3, d=4)
    c.subtract(d)
    print(c) #Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

    六、OrderDict

    OrderDict是dict的子类,它拥有dict的所有特性,而它本身是有序的(记住插入顺序的字典)。

    from collections import OrderedDict
    
    d = OrderedDict() #初始化一个字典
    d["a"] = 1
    d["b"] = 2
    d["c"] = 3
    print(d) #OrderedDict([('a', 1), ('b', 2), ('c', 3)])

    可以看到最后生成的结果并不是无序的,而是按照插入到字典中的元素进行排序的。

    在OrderDict中有很多的方法,比如:

    1、popitem

    移除最后一个添加的元素。

    from collections import OrderedDict
    
    d = OrderedDict() #初始化一个字典
    d["a"] = 1
    d["b"] = 2
    d["c"] = 3
    print(d) #OrderedDict([('a', 1), ('b', 2), ('c', 3)])
    print(d.popitem()) #('c', 3)
    print(d) #OrderedDict([('a', 1), ('b', 2)])

    2、move_to_end

    移动一个已经存在的元素到OrderDict的元素最后。

    from collections import OrderedDict
    
    d = OrderedDict() #初始化一个字典
    d["a"] = 1
    d["b"] = 2
    d["c"] = 3
    print(d) #OrderedDict([('a', 1), ('b', 2), ('c', 3)])
    d.move_to_end("b")
    print(d) #OrderedDict([('a', 1), ('c', 3), ('b', 2)])

    3、pop

    移除指定key值得元素

    from collections import OrderedDict
    
    d = OrderedDict() #初始化一个字典
    d["a"] = 1
    d["b"] = 2
    d["c"] = 3
    print(d) #OrderedDict([('a', 1), ('b', 2), ('c', 3)])
    print(d.pop("b")) #2
    print(d) #OrderedDict([('a', 1), ('c', 3)])

    七、ChainMap

     Chain是将多个dict或者映射组合在一起,从而创建单一的、可更新的视图。比如下面的情况:

    d1 = {"a":1,"b":2}
    d2 = {"c":1,"d":2}
    
    #循环打印上面的字典
    for k,v in d1.items():
        print(k,v)
    
    for k,v in d2.items():
        print(k,v)

    上面的两个字典,分别单独使用for循环打印,如果使用ChainMap就可以这样来做:

    from collections import ChainMap
    
    d1 = {"a":1,"b":2}
    d2 = {"c":1,"d":2}
    
    d3 = ChainMap(d1,d2)
    print(d3) #ChainMap({'a': 1, 'b': 2}, {'c': 1, 'd': 2})
    
    for k,v in d3.items():
        print(k,v)

    注意的是ChainMap对两个字典的合并并非是将其拷贝到另一个空间进行合并,只是对之前的两个字典进行指向。当然除了合并还有其它方法,比如:

    1、new_child

    返回一个新的ChainMap,其中包含一个新映射,以及当前实例中的所有映射。

    from collections import ChainMap
    
    d1 = {"a":1,"b":2}
    d2 = {"c":1,"d":2}
    
    d3 = ChainMap(d1,d2)
    
    d4 = d3.new_child({"e":5}) #添加新的ChainMap
    print(d4) # ChainMap({'e': 5}, {'b': 2, 'a': 1}, {'d': 2, 'c': 1})

    2、parents

    这是一个属性,返回一个新的ChainMap包含当前实例中除了第一个以外所有的maps。

    from collections import ChainMap
    
    d1 = {"a":1,"b":2}
    d2 = {"c":1,"d":2}
    
    d3 = ChainMap(d1,d2)
    
    print(d3.parents) #ChainMap({'d': 2, 'c': 1})

    3、maps

    这是一个属性,返回的是所有maps组成的列表。

    from collections import ChainMap
    
    d1 = {"a":1,"b":2}
    d2 = {"c":1,"d":2}
    
    d3 = ChainMap(d1,d2)
    
    print(d3.maps) #[{'b': 2, 'a': 1}, {'d': 2, 'c': 1}]
  • 相关阅读:
    创建桌面快捷方式
    令牌桶在数据通信QoS流量监管中的应用
    cocos2d-x 实现clash of clans多点聚焦缩放场景
    NotePad++ 快捷键中文说明
    2017第24周日
    《自信力~成为更好的自己》晨读笔记
    不知道如何决定的时候就快速决定
    《意外之外》晨读笔记
    《从“为什么”开始》听书笔记
    解决电脑上网问题记录
  • 原文地址:https://www.cnblogs.com/shenjianping/p/12825223.html
Copyright © 2011-2022 走看看