通常在进行网络数据采集时候我们会用到requests,urllib等模块,但是这些模块在使用中并不支持异步,所以今天我们介绍一个支持异步网络请求的模块aiohttp.
首先我们使用flask简单的搭一个服务器:
from flask import Flask
app = Flask(__name__)
@app.route('/xiaohua')
def xiaohua():
return 'i am xiaohua'
@app.route('/xiaohuang')
def xiaohuang():
return 'i am xiaohuang'
@app.route('/xiaoming')
def xiaoming():
return 'i am xiaoming'
if __name__ == '__main__':
app.run()
下面是通过aiohttp进行的异步访问 :
import aiohttp
import time
import asyncio
async def get_page(url): #加async关键字,使函数返回一个协程对象
async with aiohttp.ClientSession() as session: #创建ClientSession对象
async with await session.get(url=url) as response: #将url传入get方法并赋值到response
page_text = await response.text() #遇IO手动挂起
print(page_text)
start = time.time()
urls = [
'http://127.0.0.1:5000/xiaohua',
'http://127.0.0.1:5000/xiaoming',
'http://127.0.0.1:5000/xiaohuang',
'http://127.0.0.1:5000/xiaohua',
'http://127.0.0.1:5000/xiaoming',
'http://127.0.0.1:5000/xiaohuang',
'http://127.0.0.1:5000/xiaohua',
'http://127.0.0.1:5000/xiaoming',
'http://127.0.0.1:5000/xiaohuang',
]
tasks = []
loop = asyncio.get_event_loop() #创建事件循环
for url in urls:
c = get_page(url) #接收协程对象
task = asyncio.ensure_future(c) #对协程对象进一步封装成future
tasks.append(task)
loop.run_until_complete(asyncio.wait(tasks)) #将多个任务对象对应的列表注册到事件循环中,需使用asyncio.wait进行逐一取值
print('总耗时:',time.time()-start)
结果 :
i am xiaoming
i am xiaoming
i am xiaohuang
i am xiaoming
i am xiaohua
i am xiaohuang
i am xiaohuang
i am xiaohua
i am xiaohua
总耗时: 0.018949031829833984
从结果可以看出,通过使用aiohttp模块,访问实现了异步访问,大大提高了效率.