zoukankan      html  css  js  c++  java
  • pythonic-迭代器函数-itertools

    认识

    Python 的itertools模块提供了很多节省内存的高效迭代器, 尤其解决了一些关于数据量太大而导致内存溢出(outofmemory)的场景.
    我们平时用的循环绝大多数是这样的.

    # while 循环: 求1+2+...100
    s, i = 0, 1 
    while i <= 100:
        s += i 
        i += 1
    print('while-loop: the some of 1+2+..100 is:', s)
    
    
    # for 循环
    s = 0 
    for i in range(101):
        s += i
    print('for-loop: the some of 1+2+..100 is:', s)
    
    while-loop: the some of 1+2+..100 is: 5050
    for-loop: the some of 1+2+..100 is: 5050
    

    但如果数据量特别大的话就凉凉了, 所以引入了itertools,迭代器, 类似于懒加载的思想

    常用API

    • chain()
    • groupby()
    • accumulate()
    • compress()
    • takewhile()
    • islice()
    • repeat()

    chain 拼接元素

    • 把一组迭代对象串联起来,形成一个更大的迭代器:
    # join / split 
    s = "If you please draw me a sheep?"
    
    s1 = s.split()
    
    s2 = "-".join(s1)
    
    print("split->:", s1)
    print("join->:", s2)
    
    
    split->: ['If', 'you', 'please', 'draw', 'me', 'a', 'sheep?']
    join->: If-you-please-draw-me-a-sheep?
    
    import itertools 
    
    # chain
    s = itertools.chain(['if', 'you'], ['please draw', 'me', 'a'], 'shape')
    s
    
    <itertools.chain at 0x1d883602240>
    
    list(s)
    
    ['if', 'you', 'please draw', 'me', 'a', 's', 'h', 'a', 'p', 'e']
    

    不难发现, 这就是迭代器嘛, 真的没啥.跟join差不多. 那么它是如何节省内存的呢, 其实就是一个简单的迭代器思想, 一次读取一个元素进内存,这样就高效节约内存了呀

    def chain(*iterables):
        for iter_ in iterables:
            for elem in iter_:
                yield elem
    

    groupby 相邻元素

    • 把迭代器中相邻的重复元素挑出来放在一
    # 只要作用于函数的两个元素返回的值相等,这两个元素就被认为是在一组的,而函数返回值作为组的key
    for key, group in itertools.groupby('AAABBBCCAAAdde'):
        print(key, list(group))
    
    A ['A', 'A', 'A']
    B ['B', 'B', 'B']
    C ['C', 'C']
    A ['A', 'A', 'A']
    d ['d', 'd']
    e ['e']
    
    # 忽略大小写
    for key, group in itertools.groupby('AaaBBbcCAAa', lambda c: c.upper()):
        print(key, list(group))
    
    A ['A', 'a', 'a']
    B ['B', 'B', 'b']
    C ['c', 'C']
    A ['A', 'A', 'a']
    

    accumulate 累积汇总

    list(itertools.accumulate([1,2,3,4,5], lambda x,y: x*y))
    
    [1, 2, 6, 24, 120]
    
    # 伪代码
    def accumulate(iterable, func=None, *, initial=None):
        iter_ = iter(iterable)
        ret = initial
        # 循环迭代
        if initial is None:
            try:
                ret = next(iter_)
            except StopIteration:
                return 
        yield ret
        # 遍历每个元素, 调用传入的函数去处理
        for elem in iter_:
            ret = func(elem)
            yield ret
            
    

    compress 过滤

    list(itertools.compress('youge', [1,0,True,3]))
    
    ['y', 'u', 'g']
    
    def compress(data, selectors):
        for d, s in zip(data, selectors):
            if s:
                return d
            
    # demo
    for data, key in zip([1,2], 'abcd'):
        print(data,key)
        if key:
            print(data)
    
    1 a
    1
    2 b
    2
    
    # Pythonic
    def compress(data, selectors):
        return (d for d, s in zip(data, selectors) if s)
    
    # tset
    ret = compress(['love', 'you', 'forever'], ['love', None, 'dd', 'forever'])
    print(ret)
    print(list(ret))
    
    <generator object compress.<locals>.<genexpr> at 0x000001D8831498E0>
    ['love', 'forever']
    

    生成器

    • 在类中实现了iter()方法和next()方法的对象即生成器
    • 代码上有两种形式: 元组生成器 或者 函数中出现 yield 关键字

    zip

    • 对应位置进行元素拼接, 当最短的匹配上了, 则停止, 也被称为"拉长函数"

    take-while

    • takewhile: 依次迭代, 满足条件则返回, 继续迭代, 一旦不满足条件则退出
    # takewhile
    s1 = list(itertools.takewhile(lambda x:x<=2, [0,3,2,1,-1,3,0]))
    print(s1)
    
    s2 = list(itertools.takewhile(lambda x:x<5, [1,4,6,4,1,3]))
    print(s2)
    
    # dropwhile
    s3 = list(itertools.filterfalse(lambda x:x%2==0, range(10)))
    print(s3)
    
    [0]
    [1, 4]
    [1, 3, 5, 7, 9]
    
    def take_while(condition, iter_obj):
        for elem in iter_obj:
            if conditon(elem):
                yield elem
            else:
                break
    

    dropwhile: 不满足条件的则返回

    islice 切片

    # 普通的切片,也是要先全部读入内存
    # 注意是深拷贝的哦
    l = [1,2,3,4,5]
    print(l[::--1])
    
    # generator 方式
    # 默认的 start, stop, step, 只能传0或正数, 但可以自己改写的呀
    list(itertools.islice(l, 0,3,1))
    
    s = slice(3,4,5) # 只接收3个参数
    s.start
    s.stop
    
    [1, 2, 3, 4, 5]
    
    [1, 2, 3]
    
    3
    
    4
    
    
    import sys
    
    def slice(iter_obj, *args):
        s = slice(*args)
        
        start = s.start or 0 
        stop = s.stop or sys.maxsize # 很大的常量
        step = s.step or 1 
        # 构成可迭代的对象(下标)
        iter_ = iter(range(start, stop, step))
        try:
            next_i = next(iter_)
        except StopIteration:
    #         for i, elem n zip(range(start), iter_obj):
                pass
            return 
        try:
            i, elem in enumerate(iter_obj):
                if i == next_i:
                    yield elem
                    next_i = next(elem)
        except StopIteration:
            pass
        
    
    [1, 2, 3, 4, 5]
    
    

    repeat

    list(itertools.repeat(['youge'], 3))
    
    [['youge'], ['youge'], ['youge']]
    
    
    def repeat(obj, times=None):
        if times is None:
            while True:  # 一直返回
                yield obj
        else:
            for i in range(times):
                yield obj
            
    
  • 相关阅读:
    Go
    list的基本操作实现
    天梯赛练习题L2-006. 树的遍历
    部署 Fluent Bit ( td-agent-bit )
    elastalert + supervisor
    elastalert搭建
    Docker 部署 kibana( ES开启了密码认证)
    Docker 部署 elasticsearch( ES开启了密码认证)
    Python yaml模块
    Python json和pickle模块
  • 原文地址:https://www.cnblogs.com/chenjieyouge/p/11795994.html
Copyright © 2011-2022 走看看