zoukankan      html  css  js  c++  java
  • 协程下的爬虫

    from urllib import request
    import gevent, time
    from gevent import monkey  //在没有加上此句和下一句时,运行速度理论上是一样的,因为gevent检测不到I/O端口
    monkey.patch_all()
    
    def f(url):
        print('GET:%s'%url)
        resp = request.urlopen(url)
        data = resp.read()
        print('%d bytes received from %s' % (len(data),url))
    #用循环的方式爬虫,也就时串行
    urls = ['https://www.python.org/','https://www.yahoo.com/']
    start_time = time.time()
    for url in urls:
        f(url)
    print('The asynchronous total time is {time}'.format(time = time.time() - start_time))
    #用协程方式爬虫
    async_time = time.time()
    gevent.joinall([gevent.spawn(f,'https://www.python.org/'),
                    gevent.spawn(f,'https://www.yahoo.com/'),
                    ])
    print('The total time is {time}'.format(time = time.time() - async_time))
    

     运行的结果如下:

    GET:https://www.python.org/
    48835 bytes received from https://www.python.org/
    GET:https://www.yahoo.com/
    498399 bytes received from https://www.yahoo.com/
    The total time is 12.665598630905151
    GET:https://www.python.org/
    GET:https://www.yahoo.com/
    48835 bytes received from https://www.python.org/
    498546 bytes received from https://www.yahoo.com/
    The asynchronous total time is 5.80000114440918

  • 相关阅读:
    Win10中的IIS10安装php manager和IIS URL Rewrite
    第十四周
    第十三周
    第十二周
    第十一周
    第十周
    第九周
    测试作业
    第八周
    第七周
  • 原文地址:https://www.cnblogs.com/zhouzhe-blog/p/9425305.html
Copyright © 2011-2022 走看看