zoukankan      html  css  js  c++  java
  • python 3.7 协程

    
    协程:
    
    协程是实现并发编程的一种方式。一说并发,你肯定想到了多线程 / 多进程模型,没错,多线程 / 多进程
    
    
    node2:/root/python/20200524#cat t1.py 
    import time
    
    def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        time.sleep(sleep_time)
        print('OK {}'.format(url))
    
    def main(urls):
        for url in urls:
            nowTime = time.strftime("%Y-%m-%d %H:%M:%S")
            print(nowTime)
            crawl_page(url)
            print(nowTime)
    
    main(['url_1', 'url_2', 'url_3', 'url_4'])
    
    node2:/root/python/20200524#time python t1.py 
    2020-04-16 22:50:45
    crawling url_1
    OK url_1
    2020-04-16 22:50:45
    
    2020-04-16 22:50:46
    crawling url_2
    OK url_2
    2020-04-16 22:50:46
    
    2020-04-16 22:50:48
    crawling url_3
    OK url_3
    2020-04-16 22:50:48
    
    2020-04-16 22:50:51
    crawling url_4
    OK url_4
    2020-04-16 22:50:51
    
    real	0m10.043s
    user	0m0.022s
    sys	0m0.009s
    
    
    于是,一个很简单的思路出现了——我们这种爬取操作,完全可以并发化。我们就来看看使用协程怎么写
    
    
    import asyncio
    
    async def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        await asyncio.sleep(sleep_time)
        print('OK {}'.format(url))
    
    async def main(urls):
        for url in urls:
            await crawl_page(url)
    
    %time asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
    
    ########## 输出 ##########
    
    crawling url_1
    OK url_1
    crawling url_2
    OK url_2
    crawling url_3
    OK url_3
    crawling url_4
    OK url_4
    Wall time: 10 s
    
    
    node2:/root/python/20200524#time python3 t2.py 
    2020-04-16 23:42:52
    crawling url_1
    OK url_1
    2020-04-16 23:42:53
    2020-04-16 23:42:53
    crawling url_2
    OK url_2
    2020-04-16 23:42:55
    2020-04-16 23:42:55
    crawling url_3
    OK url_3
    2020-04-16 23:42:58
    2020-04-16 23:42:58
    crawling url_4
    OK url_4
    2020-04-16 23:43:02
    
    real	0m10.095s
    user	0m0.070s
    sys	0m0.014s
    10 秒就对了,还记得上面所说的,await 是同步调用,因此, crawl_page(url) 在当前的调用结束之前,是不会触发下一次调用的。于是,这个代码效果就和上面完全一样了,相当于我们用异步接口写了个同步代码。
    
    node2:/root/python/20200524#cat t3.py 
    
    import asyncio
    
    async def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        await asyncio.sleep(sleep_time)
        print('OK {}'.format(url))
    
    async def main(urls):
        tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
        for task in tasks:
            await task
    
    asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
    
    node2:/root/python/20200524#time python3 t3
    python3: can't open file 't3': [Errno 2] No such file or directory
    
    real	0m0.027s
    user	0m0.020s
    sys	0m0.007s
    node2:/root/python/20200524#time python3 t3.py
    crawling url_1
    crawling url_2
    crawling url_3
    crawling url_4
    OK url_1
    OK url_2
    OK url_3
    OK url_4
    
    real	0m4.090s
    user	0m0.034s
    sys	0m0.052s
    
  • 相关阅读:
    在TNSNAMES.ORA文件中配置本机装的oracle
    Eclipse编辑jsp、js文件时,经常出现卡死现象解决汇总
    ExtJs GridPanel 给表格行或者单元格自定义样式
    Ext.core.DomQuery Dom选择器
    JavaScript 常用方法
    ExtJs Ext.data.Model 学习笔记
    JavaScript 深入理解作用域链
    Spring 网路搜集的情报
    SpringMVC 之类型转换Converter详解转载
    SpringMVC @RequestMapping 用法详解之地址映射
  • 原文地址:https://www.cnblogs.com/hzcya1995/p/13348363.html
Copyright © 2011-2022 走看看