生成器

[TOC]

第三章 生成器

3.1 理解生成器

  • 生成器处理值序列时允许序列中的每一个值只在需要时才计算

  • 使用生成器可以节省大量内存

  • 生成器能够处理一些无法由列表准确表示的序列形式

  • 生成器是一个函数, 按照顺序返回一个或多个值

3.2 生成器语法

  • 通常, 生成器在函数内部包含一个或多个yield语句

  • return语句一样, yield语句命令函数返回一个值给调用者, 但不会终止函数的执行, 执行会暂停直到调用代码重新恢复生成器, 在停止的地方再次开始执行

  • 简单示例

    # 代码
    def fibonacci():
        yield 1
        yield 1
        yield 2
        yield 3
        yield 5
        yield 8
        pass
    
    for i in fibonacci():
        print(i)
        pass
    # 输出
    D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test01.py
    1
    1
    2
    3
    5
    8
    
    Process finished with exit code 0

3.2.1 next函数

  • Python内置了next函数, 能够让生成器请求它的下一个值

  • 简单示例

    # 代码
    def fibonacci():
        numbers = []
        while True:
            if len(numbers) < 2:
                numbers.append(1)
                pass
            else:
                numbers.append(sum(numbers))
                pass
            yield numbers[-1]
        pass
    
    gen = fibonacci()
    for i in range(1, 10):
        print(next(gen))
    # 输出
    D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test02.py
    1
    1
    2
    4
    8
    16
    32
    64
    128
    
    Process finished with exit code 0

3.2.2 终止生成器: StopIteration异常

  • Python 2中, 函数内不允许yieldreturn同时存在, 可以通过抛出StopIteration异常来终止生成器

    简单示例

    import sys
    
    print("Python版本信息: ", sys.version)

def generator(): yield 1 yield 2 raise StopIteration yield 3 pass

print([i for i in generator()])

gen = generator() while True: print(next(gen))

  **输出**

  ```python
  D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test03.py
  Traceback (most recent call last):
  Python版本信息:  3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
    File "D:/workspace/Pycharm/PurePythonProject/Test03.py", line 17, in <module>
  [1, 2]
      print(next(gen))
  1
    File "D:/workspace/Pycharm/PurePythonProject/Test03.py", line 9, in generator
  2
      raise StopIteration
  StopIteration

  Process finished with exit code 1
  • 从上例: 如果迭代这个生成器, 会得到1和2, 并可以正常退出(不抛出StopIterator异常); 如果使用next获取生成器的下一个值, 则会抛出StopIterator异常

  • Python 3中, 可以通过return语句终止生成器. 但是, return返回的值不会成为最终输出的值, 相反, 这个值会被作为异常信息发送. return语句实际等同于raise StopIteration

  • 简单示例

    import sys
    
    print("Python版本信息: ", sys.version)

def generator(): yield 1 yield 2 return 4 yield 3 pass

print([i for i in generator()])

gen = generator() while True: print(next(gen))

  **输出**

  ```shell
  D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test04.py
  Python版本信息:  3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
  [1, 2]
  1
  2
  Traceback (most recent call last):
    File "D:/workspace/Pycharm/PurePythonProject/Test04.py", line 17, in <module>
      print(next(gen))
  StopIteration: 4

  Process finished with exit code 1

3.3 与生成器之间的交互

  • 生成器协议提供了send方法, 以允许与生成器的反向沟通

简单示例

import sys

print("Python版本信息: ", sys.version)


def squares():
    a = 1
    b = 1
    while True:
        print('A: ', a, ' ', b)
        response = yield a + b
        print('B: ', a, ' ', b)
        if response:
            b = int(response)
        else:
            b += 1
        print('C: ', a, ' ', b)
    pass


sq = squares()
print(next(sq))
print(next(sq))
print(sq.send(7))
print(next(sq))

输出

D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test04.py
Python版本信息:  3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
A:  1   1
2
B:  1   1
C:  1   2
A:  1   2
3
B:  1   2
C:  1   7
A:  1   7
8
B:  1   7
C:  1   8
A:  1   8
9

Process finished with exit code 0

说明

  • send函数向生成器传递一个参数, 该参数为yield表达式的值

  • yield语句执行后暂停, 在下次调用时才会继续运行, 直到遇到下一条yield语句

3.4 迭代器对象与迭代器

  • Python中, 生成器是一种迭代器

  • Python中的迭代器是包含__next__方法的任何对象

  • 迭代对象 是任何定义了__iter__方法的对象, 可迭代对象的__iter__方法负责返回一个迭代器

  • 生成器可以是迭代器, 但不一定是迭代对象, 相似的, 并非所有的迭代对象都是迭代器

range为例

  • range对象不是生成器, 因为range对象没有定义__next__函数, 不能作为next()函数的参数

  • 但是, range对象的__iter__方法返回的实际迭代器是一个生成器, 可以响应next方法. 但是不能响应send方法

    D:\workspace\Pycharm\PurePythonProject>python
    Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> r = range(0, 5)
    >>> r
    range(0, 5)
    >>> next(r)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'range' object is not an iterator
    >>> type(r)
    <class 'range'>
    >>> it = iter(r)
    >>> it
    <range_iterator object at 0x0000014A70D9AEF0>
    >>> next(it)
    0
    >>> next(it)
    1
    >>> it.send(2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'range_iterator' object has no attribute 'send'
  • 语句for i in range(0, 5)之所以可以运行, 是因为range函数返回了一个迭代对象

  • 注意: 并非所有的迭代器都是generator类的实例. range迭代器是range_iterator的实例, 实现了相似的模式, 且缺少send方法 的实现

3.5 标准库中的生成器

3.5.1 range

  • Python 2中成为xrange

  • range对象的迭代器(由__iter__返回)是个生成器(range_iterator)

3.5.2 dict.items家族

  • Python中, 内置的字典类包括3个允许迭代所有字典的方法, 并且这3个方法都是迭代器是生成器的迭代对象(keys, values, items, 注意:Python 2中, 三个方法分别是: iterkeys, itervalues, iteritems)

  • dict.items声明为生成器, 就不需要将整个字典重新格式化为包含键值的二元组列表

    简单示例

    D:\workspace\Pycharm\PurePythonProject>python
    Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> d = {'foo': 'bar', 'baz': 'bacon'}
    >>> it = iter(d.items())
    >>> next(it)
    ('foo', 'bar')
    >>> next(it)
    ('baz', 'bacon')
    >>> next(it)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    >>>
  • 生成器返回所有数据后, 继续作为next函数的参数时, 会抛出StopIteraion异常

  • 在迭代期间修改字典, 可能会看到副作用

    D:\workspace\Pycharm\PurePythonProject>python
    Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> d = {'foo': 'bar', 'baz': 'bacon'}
    >>> it = iter(d.items())
    >>> next(it)
    ('foo', 'bar')
    >>> d['spam'] = 'eggs'
    >>> next(it)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: dictionary changed size during iteration

3.5.3 zip

  • Python包含了一个名为zip的内置函数, 该函数有多个可迭代对象, 并且一起迭代所有对象

    D:\workspace\Pycharm\PurePythonProject>python
    Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> z = zip([1, 2, 3, 4], ['a', 'b', 'c'], ['A', 'B'])
    >>> next(z)
    (1, 'a', 'A')
    >>> next(z)
    (2, 'b', 'B')
    >>> next(z)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
  • 到达最短的迭代对象的最后一个元素时停止

3.5.4 map

  • map函数的参数为: 一个接受N个参数的函数, 以及N个迭代对象, 并且计算每个迭代对象的序列成员的函数结果, 到达最短的迭代对象的最后一个元素时停止

    D:\workspace\Pycharm\PurePythonProject>python
    Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> m = map(lambda x, y: x + y, [1, 2, 3], [4, 5])
    >>> next(m)
    5
    >>> next(m)
    7
    >>> next(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration

3.5.5 文件对象

  • 可以利用生成器逐行读取文本文件内容

    D:\workspace\Pycharm\PurePythonProject>cat lines.txt
    line 1
    line 2
    line 3
    line 4
    line 5
    
    D:\workspace\Pycharm\PurePythonProject>python
    Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> f = open('lines.txt')
    >>> next(f)
    'line 1\n'
    >>> f.readline()
    'line 2\n'
    >>> next(f)
    'line 3\n'
    >>> f.readline()
    'line 4\n'
    >>> next(f)
    'line 5\n'
    >>> next(f)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    >>> f.readline()
    ''
  • 由上面代码可以看出, next(f)f.readline()语句的返回结果来自同一生成器对象, 但是, __next__函数(使用next(f)时会调用)和readline函数不完全一样, readline函数会捕捉StopIteration异常, 并返回一个空字符串

3.6 生成器单例模式

  • 简单的生成器函数并不是单例模式的, 多次调用函数将返回不同的生成器实例

    import sys
    
    print("Python版本信息: ", sys.version)

def fibonacci(): numbers = [] while True: if len(numbers) < 2: numbers.append(1) pass else: numbers.append(sum(numbers)) pass yield numbers[-1] pass

gen1 = fibonacci() print(next(gen1), next(gen1), next(gen1)) gen2 = fibonacci() print(next(gen2), next(gen2), next(gen2)) print(next(gen1), next(gen1), next(gen1))

  **输出**

  ```shell
  D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test05.py
  Python版本信息:  3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
  1 1 2
  1 1 2
  4 8 16

  Process finished with exit code 0
  • 下面的可迭代类可以实现类似单例模式的目的

    import sys
    
    print("Python版本信息: ", sys.version)
    
    class Fibonacci(object):
        def init(self):
            self.numbers = []
            pass
    
        def __iter__(self):
              return self
    
        def __next__(self):
            if len(self.numbers) < 2:
                self.numbers.append(1)
                pass
            else:
                self.numbers.append(sum(self.numbers))
                self.numbers.pop(0)
                pass
            return self.numbers[-1]
    
        def send(self, value):
            pass
    
        # 兼容 Python 2
        next = __next__
        pass
    
      f = Fibonacci()
      i1 = iter(f)
      print(next(i1), next(i1), next(i1))
      i2 = iter(f)
      print(next(i2))

输出

D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test05.py
  Python版本信息:  3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
  1 1 2
  3

  Process finished with exit code 0

3.7 生成器内部的生成器

  • 一个生成器调用另外一个生成器是可行的

  • Python 3.3引入了新的yield from语句, 旨在为生成器提供一种调用其他生成器的直接方式

    简单示例

    import sys
    import itertools
    
    print("Python版本信息: ", sys.version)
    
    def gen1():
        yield 'foo'
        yield 'bar'
        pass
    
    def gen2():
        yield 'spam'
        yield 'eggs'
        pass

第一种调用方式

def full_gen_1(): for word in gen1(): yield word pass for word in gen2(): yield word pass pass

第二种调用方式, 使用 itertools.chain

def full_gen_2(): for word in itertools.chain(gen1(), gen2()): yield word pass pass

第三种调用方式, 使用 yield from 语句

def full_gen_3(): yield from gen1() yield from gen2() pass

print([i for i in full_gen_1()]) print([i for i in full_gen_2()]) print([i for i in full_gen_3()])

  **输出**

  ```shell
  D:\Software\Installed\Python\Python36\python.exe D:/workspace/Pycharm/PurePythonProject/Test06.py
  Python版本信息:  3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
  ['foo', 'bar', 'spam', 'eggs']
  ['foo', 'bar', 'spam', 'eggs']
  ['foo', 'bar', 'spam', 'eggs']

  Process finished with exit code 0
  • yield from被称为生成器委托

  • 使用yield from时, 该生成器委托给另一个生成器, 任何使用send函数发送个生成器的值, 都将被发送给当前委托生成器. 而前两种实现, 则需要手动传递该值.

最后更新于

这有帮助吗?