zoukankan      html  css  js  c++  java
  • 协程下的爬虫

    from urllib import request
    import gevent, time
    from gevent import monkey  //在没有加上此句和下一句时,运行速度理论上是一样的,因为gevent检测不到I/O端口
    monkey.patch_all()
    
    def f(url):
        print('GET:%s'%url)
        resp = request.urlopen(url)
        data = resp.read()
        print('%d bytes received from %s' % (len(data),url))
    #用循环的方式爬虫,也就时串行
    urls = ['https://www.python.org/','https://www.yahoo.com/']
    start_time = time.time()
    for url in urls:
        f(url)
    print('The asynchronous total time is {time}'.format(time = time.time() - start_time))
    #用协程方式爬虫
    async_time = time.time()
    gevent.joinall([gevent.spawn(f,'https://www.python.org/'),
                    gevent.spawn(f,'https://www.yahoo.com/'),
                    ])
    print('The total time is {time}'.format(time = time.time() - async_time))
    

     运行的结果如下:

    GET:https://www.python.org/
    48835 bytes received from https://www.python.org/
    GET:https://www.yahoo.com/
    498399 bytes received from https://www.yahoo.com/
    The total time is 12.665598630905151
    GET:https://www.python.org/
    GET:https://www.yahoo.com/
    48835 bytes received from https://www.python.org/
    498546 bytes received from https://www.yahoo.com/
    The asynchronous total time is 5.80000114440918

  • 相关阅读:
    一个完成的spring xml配置文件
    一个简单的Spring程序
    Spring beans.xml
    strust2的核心和工作原理
    MVC模式
    JSON 之FastJson解析
    Java本地方法(native方法)的实现
    RMI(Remote Method Invocation ) 概念恢复
    java注解
    输入sql语句,将结果写入到xml文件
  • 原文地址:https://www.cnblogs.com/zhouzhe-blog/p/9425305.html
Copyright © 2011-2022 走看看