zoukankan      html  css  js  c++  java
  • Day 25:Python 模块 collections 3 个常用类

    NamedTuple

    数据分析或机器学习领域,用好 NamedTuples 会写出可读性更强、更易于维护的代码。

    做特征工程的时候,如果把特征扔到一个list当中,以便日后取用,但是取用的时候,难免出现整数索引,代码可读性差,所以使用NamedTuple,避免出现整数索引,希望是直接按属性索引:

    from collections import namedtuple
    
    #创建一个带有 14 个属性,名字为 Person 的 NamedTuple 实例 Person
    Person = namedtuple('Person',['id','age','height','name','address','province','city','town','country','birth_address','father_name', 'monther_name','telephone','emergency_telephone'])
    # 调用实例 Person,创建一个 id=10086 的 Person 对象
    a = ['']*11
    Person(10086,19,'xiaoming',*a)
    
    output:
    Person(id=10086, age=19, height='xiaoming', name='', address='', province='', city='', town='', country='', birth_address='', father_name='', monther_name='', telephone='', emergency_telephone='')

    假设有个任务,再有老数据的情况下,有了一份新数据,现在要比较,哪些人的居住地址(对应字段 address)、联系电话(对应字段 telephone)、出生地信息(对应字段 birth address)发生了变化,统计出这些人。

    使用NamedTuple方法:

    def update_persons_info(old_data,new_data):
        changed_list = []
        for line in new_data:
            new_props = line.split() 
            new_person = Person(new_props) # new_props 与 Person 参数卡对好
            for old in old_data: 
                old_props =  old.split() 
                old_person = Person(old_props)
                if old_person.id != new_person.id: 
                    changed_list.append(old_person.id)
                elif old_person.address != new_person.address:
                    changed_list.append(old_person.address)
                elif old_person.birth_address != new_person.birth_address: 
                    changed_list.append(old_person.birth_address)
        return changed_list

    但是在带来这样的遍历同时,也带来一个问题,NamedTuple 创建后,它的属性取值不允许被修改,也就是属性只能是可读的,就看怎么用了。

    from collections import namedtuple
    
    #创建一个带有 14 个属性,名字为 Person 的 NamedTuple 实例 Person
    Person = namedtuple('Person',['id','age','height','name','address','province','city','town','country','birth_address','father_name', 'monther_name','telephone','emergency_telephone'])
    # 调用实例 Person,创建一个 id=10086 的 Person 对象
    a = ['']*11
    xiaoming = Person(10086,19,'xiaoming',*a)
    print(type(xiaoming))
    xiaoming.age = 20
    
    output:
    <class '__main__.Person'>
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-5-dabf7d8e12b1> in <module>()
          7 xiaoming = Person(10086,19,'xiaoming',*a)
          8 print(type(xiaoming))
    ----> 9 xiaoming.age = 20
    
    AttributeError: can't set attribute

    Counter

    主要用于统计中的计数,使用 Counter,期待能写出更加简化的代码

    from collections import Counter
    
    # 统计出现次数
    freq = [3, 8, 3, 10, 3, 3, 1, 3, 7, 6, 1, 2, 7, 0, 7, 9, 1, 5, 1, 0]
    Counter(freq).most_common()
    
    output:
    [(3, 5),
     (1, 4),
     (7, 3),
     (0, 2),
     (8, 1),
     (10, 1),
     (6, 1),
     (2, 1),
     (9, 1),
     (5, 1)]

    并且,是按照频数由高到低排序的,牛的

    使用 Counter 能快速统计,一句话中单词出现次数,一个单词中字符出现次数。如下所示:

    text = """
    def update_persons_info(old_data,new_data):
        changed_list = []
        for line in new_data:
            new_props = line.split() 
            new_person = Person(new_props) # new_props 与 Person 参数卡对好
            for old in old_data: 
                old_props =  old.split() 
                old_person = Person(old_props)
                if old_person.id != new_person.id: 
                    changed_list.append(old_person.id)
                elif old_person.address != new_person.address:
                    changed_list.append(old_person.address)
                elif old_person.birth_address != new_person.birth_address: 
                    changed_list.append(old_person.birth_address)
        return changed_list"""
    Counter(text).most_common()
    
    output:
    [(3, 5),
     (1, 4),
     (7, 3),
     (0, 2),
     (8, 1),
     (10, 1),
     (6, 1),
     (2, 1),
     (9, 1),
     (5, 1)]
    
    text = """
    def update_persons_info(old_data,new_data):
        changed_list = []
        for line in new_data:
            new_props = line.split() 
            new_person = Person(new_props) # new_props 与 Person 参数卡对好
            for old in old_data: 
                old_props =  old.split() 
                old_person = Person(old_props)
                if old_person.id != new_person.id: 
                    changed_list.append(old_person.id)
                elif old_person.address != new_person.address:
                    changed_list.append(old_person.address)
                elif old_person.birth_address != new_person.birth_address: 
                    changed_list.append(old_person.birth_address)
        return changed_list"""
    Counter(text).most_common()
    text = """
    def update_persons_info(old_data,new_data):
        changed_list = []
        for line in new_data:
            new_props = line.split() 
            new_person = Person(new_props) # new_props 与 Person 参数卡对好
            for old in old_data: 
                old_props =  old.split() 
                old_person = Person(old_props)
                if old_person.id != new_person.id: 
                    changed_list.append(old_person.id)
                elif old_person.address != new_person.address:
                    changed_list.append(old_person.address)
                elif old_person.birth_address != new_person.birth_address: 
                    changed_list.append(old_person.birth_address)
        return changed_list"""
    Counter(text).most_common()
    [(' ', 182),
     ('e', 45),
     ('d', 42),
     ('s', 40),
     ('n', 38),
     ('o', 36),
     ('r', 33),
     ('p', 31),
     ('_', 30),
     ('l', 24),
     ('a', 23),
     ('i', 21),
     ('t', 16),
     ('
    ', 15),
     ('.', 14),
     ('w', 9),
     ('(', 8),
     (')', 8),
     ('h', 8),
     ('=', 8),
     ('f', 7),
     (':', 6),
     ('c', 5),
     ('g', 5),
     ('P', 3),
     ('!', 3),
     ('b', 3),
     ('u', 2),
     (',', 1),
     ('[', 1),
     (']', 1),
     ('#', 1),
     ('', 1),
     ('', 1),
     ('', 1),
     ('', 1),
     ('', 1),
     ('', 1)]

    DefaultDict

    DefaultDict 能自动创建一个被初始化的字典,也就是每个键都已经被访问过一次。

    如何创建默认初始化某类型的字典值

    from collections import defaultdict
    
    # 创建一个字典值类型为 int 的默认字典:
    dict_1 = defaultdict(int)
    # 创建一个字典值类型为 list 的默认字典:
    dict_2 = defaultdict(list)
    dict_2
    
    output:
    defaultdict(list, {})
    
    s = 'from collections import defaultdict'
    for index,i in enumerate(s):
        dict_2[i].append(index)
    print(dict_2)
    
    output:
    defaultdict(<class 'list'>, {'f': [0, 26], 'r': [1, 21], 'o': [2, 6, 13, 20], 'm': [3, 18], ' ': [4, 16, 23], 'c': [5, 10, 33], 'l': [7, 8, 29], 'e': [9, 25], 't': [11, 22, 30, 34], 'i': [12, 17, 32], 'n': [14], 's': [15], 'p': [19], 'd': [24, 31], 'a': [27], 'u': [28]})
  • 相关阅读:
    java内部类自我总结
    eclipse中调试第三方jar中的代码
    java提升性能的好习惯(转)
    WMI获取驱动版本
    cmd中的特殊符号
    DISM命令应用大全
    C#自检系统
    注册表检查
    PictrueBox 显示Resources里面的图片
    Linq to XML
  • 原文地址:https://www.cnblogs.com/PiaYie/p/15020586.html
Copyright © 2011-2022 走看看