zoukankan      html  css  js  c++  java
  • python 3.7 协程

    
    协程:
    
    协程是实现并发编程的一种方式。一说并发,你肯定想到了多线程 / 多进程模型,没错,多线程 / 多进程
    
    
    node2:/root/python/20200524#cat t1.py 
    import time
    
    def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        time.sleep(sleep_time)
        print('OK {}'.format(url))
    
    def main(urls):
        for url in urls:
            nowTime = time.strftime("%Y-%m-%d %H:%M:%S")
            print(nowTime)
            crawl_page(url)
            print(nowTime)
    
    main(['url_1', 'url_2', 'url_3', 'url_4'])
    
    node2:/root/python/20200524#time python t1.py 
    2020-04-16 22:50:45
    crawling url_1
    OK url_1
    2020-04-16 22:50:45
    
    2020-04-16 22:50:46
    crawling url_2
    OK url_2
    2020-04-16 22:50:46
    
    2020-04-16 22:50:48
    crawling url_3
    OK url_3
    2020-04-16 22:50:48
    
    2020-04-16 22:50:51
    crawling url_4
    OK url_4
    2020-04-16 22:50:51
    
    real	0m10.043s
    user	0m0.022s
    sys	0m0.009s
    
    
    于是,一个很简单的思路出现了——我们这种爬取操作,完全可以并发化。我们就来看看使用协程怎么写
    
    
    import asyncio
    
    async def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        await asyncio.sleep(sleep_time)
        print('OK {}'.format(url))
    
    async def main(urls):
        for url in urls:
            await crawl_page(url)
    
    %time asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
    
    ########## 输出 ##########
    
    crawling url_1
    OK url_1
    crawling url_2
    OK url_2
    crawling url_3
    OK url_3
    crawling url_4
    OK url_4
    Wall time: 10 s
    
    
    node2:/root/python/20200524#time python3 t2.py 
    2020-04-16 23:42:52
    crawling url_1
    OK url_1
    2020-04-16 23:42:53
    2020-04-16 23:42:53
    crawling url_2
    OK url_2
    2020-04-16 23:42:55
    2020-04-16 23:42:55
    crawling url_3
    OK url_3
    2020-04-16 23:42:58
    2020-04-16 23:42:58
    crawling url_4
    OK url_4
    2020-04-16 23:43:02
    
    real	0m10.095s
    user	0m0.070s
    sys	0m0.014s
    10 秒就对了,还记得上面所说的,await 是同步调用,因此, crawl_page(url) 在当前的调用结束之前,是不会触发下一次调用的。于是,这个代码效果就和上面完全一样了,相当于我们用异步接口写了个同步代码。
    
    node2:/root/python/20200524#cat t3.py 
    
    import asyncio
    
    async def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        await asyncio.sleep(sleep_time)
        print('OK {}'.format(url))
    
    async def main(urls):
        tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
        for task in tasks:
            await task
    
    asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
    
    node2:/root/python/20200524#time python3 t3
    python3: can't open file 't3': [Errno 2] No such file or directory
    
    real	0m0.027s
    user	0m0.020s
    sys	0m0.007s
    node2:/root/python/20200524#time python3 t3.py
    crawling url_1
    crawling url_2
    crawling url_3
    crawling url_4
    OK url_1
    OK url_2
    OK url_3
    OK url_4
    
    real	0m4.090s
    user	0m0.034s
    sys	0m0.052s
    
  • 相关阅读:
    详解javascript实现自定义事件
    详谈LABJS按需动态加载js文件
    SeaJS入门教程系列之SeaJS介绍(一)
    Underscore.js 入门
    Underscore.js (1.7.0)-集合(Collections)(25)
    Underscore.js (1.7.0)-函数预览
    js/jquery判断浏览器的方法小结
    ParNew收集器
    CMS(Concurrent Mark-Sweep)
    java集合类深入分析之Queue篇(Q,DQ)
  • 原文地址:https://www.cnblogs.com/hzcya1995/p/13348363.html
Copyright © 2011-2022 走看看