Python iterators and generators
In this part of the Python tutorial, we work with interators and generators. Iterator is an object which allows a programmer to traverse through all the elements of a collection, regardless of its specific implementation.
In Python, an iterator is an object which implements the iterator protocol(迭代器协议). The iterator protocol consists of two methods. The __iter__()
method, which must return the iterator object, and the next()
method, which returns the next element from a sequence.(实现__iter__()
和next()
即为迭代器)
Iterators have several advantages:
- Cleaner code
- Iterators can work with infinite sequences
- Iterators save resources
Python has several built-in objects, which implement the iterator protocol. For example lists, tuples, strings, dictionaries or files.
(此处表述有些问题,严格地说上述python内建对象并不是迭代器,而是可迭代对象。因为它们本身并未实现next方法。只有在iter方法作用于这些对象之后,生成的新对象才具备next方法,才真正称为迭代器。for语句内建地调用了iter,所以内建地生成了这些内建对象的迭代器,从而可以实现对它们的遍历)
#!/usr/bin/python3 # iterator.py str = "formidable" for e in str: print(e, end=" ") print() it = iter(str) print(it.next()) print(it.next()) print(it.next()) print(list(it))
In the code example, we show a built-in iterator on a string. In Python a string is an immutable sequence of characters. The iter()
function returns an iterator on object. We can also use the list()
or tuple()
functions on iterators.
$ ./iterator.py f o r m i d a b l e f o r ['m', 'i', 'd', 'a', 'b', 'l', 'e']
Python reading lines
By saving system resources we mean that when working with iterators, we can get the next element in a sequence without keeping the entire dataset in memory.
#!/usr/bin/python3 # read_data.py with open('data.txt', 'r') as f: while True: line = f.readline() if not line: break else: print(line.rstrip())
This code prints the contents of the data.txt
file. Instead of using a while loop, we can apply an iterator, which simplyfies our task.
#!/usr/bin/python3 # read_data_iterator.py with open('data.txt', 'r') as f: for line in f: print(line.rstrip())
The open()
function returns a file object, which is an iterator. We can use it in a for loop. With the usage of an iterator, the code is cleaner.
Python iterator protocol
In the following example, we create a custom object that implements the iterator protocol.
#!/usr/bin/python3
# iterator_protocol.py
class Seq:
def __init__(self):
self.x = 0
def __next__(self):
self.x += 1
return self.x**self.x
def __iter__(self):
return self
s = Seq()
n = 0
for e in s:
print(e)
n += 1
if n > 10:
break
In the code example, we create a sequence of numbers 1, 4, 27, 256, ... . This demonstrates that with iterators, we can work with infinite sequences.
def __iter__(self): return self
The for statement calls the __iter__()
function on the container object. The function returns an iterator object that defines the method __next__()
, which accesses elements in the container one at a time.
def next(self): self.x += 1 return self.x**self.x
The next()
method returns the next element of a sequence.
if n > 10: break
Because we are working with an infinite sequence, we must interrupt the for loop.
$ ./iterator.py 1 4 27 256 3125 46656 823543 16777216 387420489 10000000000 285311670611
StopIteration
The loop can be interrupted in another way. In the class definition we must raise a StopIteration
exception. In the following example, we redo our previous example.
#!/usr/bin/python3 # stopiter.py class Seq14: def __init__(self): self.x = 0 def __next__(self): self.x += 1 if self.x > 14: raise StopIteration return self.x ** self.x def __iter__(self): return self s = Seq14() for e in s: print(e)
The code example will print first 14 numbers of a sequence.
if self.x > 14: raise StopIteration
The StopIteration
exception will cease the for loop.
$ ./stop_iter.py 1 4 27 256 3125 46656 823543 16777216 387420489 10000000000 285311670611 8916100448256 302875106592253 11112006825558016
This is the output of the example.
Python generators#生成器,生成器无需调用iter函数,自身天然具备iter和next入方法。因此它天然地是迭代器。它称为迭代器的手段有二,一为推导式,二为yield方式
Generator is a special routine that can be used to control the iteration behaviour of a loop. A generator is similar to a function returning an array. A generator has parameters, it can be called and it generates a sequence of numbers. But unlike functions, which return a whole array, a generator yields one value at a time. This requires less memory.
Generators in Python:
- Are defined with the def keyword
- Use the
yield
keyword - May use several
yield
keywords - Return an iterator
Let's look at an generator example.
#!/usr/bin/python3 # simple_generator.py def gen(): x, y = 1, 2 yield x, y x += 1 yield x, y g = gen() print(next(g)) print(next(g)) try: print(next(g)) except StopIteration: print("Iteration finished")
The program creates a very simple generator.
def gen(): x, y = 1, 2 yield x, y x += 1 yield x, y
A generator is defined with a def
keyword, just like normal functions. We use two yield
keywords inside the body of a generator. The yield
keyword exits the generator and returns a value. Next time the next()
function of an iterator is called, we continue on the line following the yield
keyword. Note that the local variables are preserved throughout the iterations. When there is nothing left to yield, a StopIteration
exception is raised.
$ ./generator.py (1, 2) (2, 2) Iteration finished
The following example we calculates Fibonacci numbers. The first number of the sequence is 0, the second number is 1, and each subsequent number is equal to the sum of the previous two numbers of the sequence itself.
#!/usr/bin/python3 # fibonacci_gen.py import time def fib(): a, b = 0, 1 while True: yield b a, b = b, a + b g = fib() try: for e in g: print(e) time.sleep(1) except KeyboardInterrupt: print("Calculation stopped")
The script continuously prints Fibonacci numbers to the console. It is terminated with Ctrl + C key combination.
Python generator expression
Generator expression is similar to a list comprehension. The difference is that a generator expression returns a generator, not a list.
#!/usr/bin/python3 # generator_expression.py n = (e for e in range(50000000) if not e % 3) i = 0 for e in n: print(e) i += 1 if i > 100: raise StopIteration
The example calculates values that can be divided by 3 without a remainder.
n = (e for e in range(50000000) if not e % 3)
A generator expression is created with round brackets. Creating a list comprehension in this case would be very inefficient because the example would occupy a lot of memory unnecessarily. Insted of this, we create a generator expression, which generates values lazily on demand.
i = 0 for e in n: print(e) i += 1 if i > 100: raise StopIteration
In the for loop, we generate 100 values with a generator. We have done this without extensive usage of memory.
In the next example, we create a grep-like utility in Python using generator expression.
The Roman Empire (Latin: Imperium Rōmānum; Classical Latin: [ɪmˈpɛ.ri.ũː roːˈmaː.nũː] Koine and Medieval Greek: Βασιλεία τῶν Ῥωμαίων, tr. Basileia tōn Rhōmaiōn) was the post-Roman Republic period of the ancient Roman civilization, characterized by government headed by emperors and large territorial holdings around the Mediterranean Sea in Europe, Africa and Asia. The city of Rome was the largest city in the world c. 100 BC – c. AD 400, with Constantinople (New Rome) becoming the largest around AD 500,[5][6] and the Empire's populace grew to an estimated 50 to 90 million inhabitants (roughly 20% of the world's population at the time).[n 7][7] The 500-year-old republic which preceded it was severely destabilized in a series of civil wars and political conflict, during which Julius Caesar was appointed as perpetual dictator and then assassinated in 44 BC. Civil wars and executions continued, culminating in the victory of Octavian, Caesar's adopted son, over Mark Antony and Cleopatra at the Battle of Actium in 31 BC and the annexation of Egypt. Octavian's power was then unassailable and in 27 BC the Roman Senate formally granted him overarching power and the new title Augustus, effectively marking the end of the Roman Republic.
We use this text file.
#!/usr/bin/python3 # gen_grep.py import sys def grep(pattern, lines): return ((line, lines.index(line)+1) for line in lines if pattern in line) file_name = sys.argv[2] pattern = sys.argv[1] with open(file_name, 'r') as f: lines = f.readlines() for line, n in grep(pattern, lines): print(n, line.rstrip())
The example reads data from a file and prints lines that contain the specified pattern and their line numbers.
def grep(pattern, lines): return ((line, lines.index(line)+1) for line in lines if pattern in line)
The grep-like utility uses this generator expression. The expression goes through the list of lines and picks those, which contain the patter. It calculates the index of the line in the list, which is its line number in the file.
with open(file_name, 'r') as f: lines = f.readlines() for line, n in grep(pattern, lines): print(n, line.rstrip())
We open the file for reading and call the grep()
function on the data. The function returns a generator, which is traversed with the for loop.
$ ./gen_grep.py Roman roman_empire.txt 1 The Roman Empire (Latin: Imperium Rōmānum; Classical Latin: [ɪmˈpɛ.ri.ũː roːˈmaː.nũː] 3 post-Roman Republic period of the ancient Roman civilization, characterized by government 13 then unassailable and in 27 BC the Roman Senate formally granted him overarching power and 14 the new title Augustus, effectively marking the end of the Roman Republic.
There are four lines that contain the 'Roman' word in the file.
In this chapter, we have covered iterators and generators in Python.