zoukankan      html  css  js  c++  java
  • gevent 和twisted模块实现并发

    对于多线程和多进程的缺点是在IO阻塞时会造成线程和进程的浪费,所以异步IO会是首选,有下面几种:

    一、异步IO

    1、asyncio + aiohttp + requests

    2、gevent + requests +grequests

    3、twisted

    4、tornado

    5、asyncio

    6、gevent+requests

    7、grequests

    gevent+requests

    import gevent
    import requests
    from gevent import monkey
    #替换内置的socket,更换成gevent封装的弄成非阻塞的
    monkey.patch_all()
    def fetch_async(method,url,req_kwargs):
        print(method,url,req_kwargs)
        response = requests.request(method=method,url=url,**req_kwargs)
        print(response.url,response.content)
    #发送请求,可以称下面为三个协程
    gevent.joinall([
        gevent.spawn(fetch_async,method="get",url="https://www.python.org/",req_kwargs={}),
        gevent.spawn(fetch_async, method="get", url="https://www.yahoo.com/", req_kwargs={}),
        gevent.spawn(fetch_async, method="get", url="https://github.com/", req_kwargs={}),
    ])
    

    利用gevent+urllib爬取网站如下:

    import gevent
    import requests
    import urllib.request
    from gevent import monkey
    #替换内置的socket,更换成gevent封装的弄成非阻塞的
    monkey.patch_all()
    def run_task(url):
        print("Visit --> %s" %url)
        try:
            response = urllib.request.urlopen(url)
            data = response.read()
            print("%d bytes received from %s." %(len(data),url))
        except Exception as e:
            print(e)
    
    if __name__ == '__main__':
        urls = ['https://www.baidu.com','https://docs.python.org/3/library/urllib.html','https://www.cnblogs.com/wangmo/p/7784867.html']
        greenlets = [gevent.spawn(run_task,url) for url in urls]
        gevent.joinall(greenlets)
    

    gevent协程池控制最大的协程数量

    import gevent
    import requests
    import urllib.request
    from gevent import monkey
    #替换内置的socket,更换成gevent封装的弄成非阻塞的
    monkey.patch_all()
    def fetch_async(method,url,req_kwargs):
        print(method,url,req_kwargs)
        response = requests.request(method=method,url=url,**req_kwargs)
        print(response.url,response.content)
    #发送请求,可以称下面为三个协程
    #发送请求(协程池控制最大协程数量)
    from gevent.pool import Pool
    pool = Pool(3)
    gevent.joinall([
        pool.spawn(fetch_async,method="get",url="https://www.python.org/",req_kwargs={}),
        pool.spawn(fetch_async, method="get", url="https://www.yahoo.com/", req_kwargs={}),
        pool.spawn(fetch_async, method="get", url="https://github.com/", req_kwargs={}),
    ])
    

    grequests内置有gevent.joinall

    import grequests
    request_list = [
        grequests.get('http://httpbin.org/delay/1', timeout=0.001),
        grequests.get('http://fakedomain/'),
        grequests.get('http://httpbin.org/status/500')
    ]
    ####执行并获取响应列表####
    response_list = grequests.map(request_list)
    print(response_list)
    

    twisted

    1、事件循环是,循环等待获取请求返回的内容

    2、当所有的请求都获取到了结果,事件循环会一直在循环,所以得判断当请求数与获取的结果数一样时,利用twisted.stop()停止事件循环:

    #发送http请求
    from twisted.web.client import getPage
    #事件循环
    from twisted.internet import reactor
    REV_COUNTER = 0
    REQ_COUNTER = 0
    def callback(contents):
        print(contents)
        global REV_COUNTER
        REV_COUNTER +=1
        if REV_COUNTER == REQ_COUNTER:
            #已经获取到请求的所有数量时关闭事件循环
            reactor.stop()
    url_list = ['http://www.bing.com', 'http://www.baidu.com', ]
    REQ_COUNTER = len(url_list)
    for url in url_list:
        deferred = getPage(bytes(url,encoding="utf8"))
        deferred.addCallback(callback)
    #时间循环等待返回的结果
    reactor.run()
    
    
  • 相关阅读:
    Neo4j 第五篇:批量更新数据
    Neo4j 第四篇:使用.NET驱动访问Neo4j
    Neo4j 第三篇:Cypher查询入门
    Neo4j 第二篇:图形数据库
    Neo4j 第一篇:在Windows环境中安装Neo4j
    ElasticSearch入门 第九篇:实现正则表达式查询的思路
    ElasticSearch入门 第八篇:存储
    ElasticSearch入门 第七篇:分词
    ElasticSearch入门 第六篇:复合数据类型——数组,对象和嵌套
    Package 设计3:数据源的提取和使用暂存
  • 原文地址:https://www.cnblogs.com/venvive/p/11657228.html
Copyright © 2011-2022 走看看