zoukankan      html  css  js  c++  java
  • 1、asyncio aiohttp aiofile 异步爬取图片

    前后折腾了好多天,不废话,先直接上代码,再分析:

     1 import aiohttp
     2 import asyncio
     3 import aiofiles
     4 
     5 header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1',
     6                   'Referer': 'https://www.mzitu.com/',
     7                    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
     8                    'Accept-Encoding': 'gzip',
     9                    }
    10 
    11 async def fetch(session, url):
    12     async with session.get(url, proxy='http://59.62.164.252:9999') as response:
    13         return await response.read()
    14 
    15 async def main():
    16     async with aiohttp.ClientSession(headers=header) as session:
    17         content = await fetch(session, 'https://i.meizitu.net/thumbs/2019/03/174061_01e35_236.jpg')
    18         print(content)
    19         async with aiofiles.open('D:/a.jpg', 'wb') as f:
    20             f.write(content)
    21 
    22 loop = asyncio.get_event_loop()
    23 loop.run_until_complete(main())
    24 loop.close()

    开始心路历程:

    1、看了廖雪峰老师python教程中协程一章节、《流畅的python》中协程一章节,以及前前后后网上查询的资料,不管怎么改均报错,人接近暴走状态。

    最后Google查询ClientSession:Client Reference复制源码做尝试:

     1 import aiohttp
     2 import asyncio
     3 
     4 async def fetch(client):
     5     async with client.get('http://python.org') as resp:
     6         assert resp.status == 200
     7         return await resp.text()
     8 
     9 async def main():
    10     async with aiohttp.ClientSession() as client:
    11         html = await fetch(client)
    12         print(html)
    13 
    14 loop = asyncio.get_event_loop()
    15 loop.run_until_complete(main())

    运行成功

    2、改为下载图片,并想fetch函数能不能直接返回response?

     1 import aiohttp
     2 import asyncio
     3 import aiofiles
     4 
     5 header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1',
     6                   'Referer': 'https://www.mzitu.com/',
     7                    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
     8                    'Accept-Encoding': 'gzip',
     9                    }
    10 
    11 async def fetch(session, url):
    12     async with session.get(url) as response:
    13         return response
    14 
    15 async def main():
    16     async with aiohttp.ClientSession() as session:
    17         response = await fetch(session, 'https://i.meizitu.net/thumbs/2019/03/174061_01e35_236.jpg')
    18         print(response.read())
    19         with open('D:/a.jpg', 'wb') as f:
    20             f.write(response.read())
    21 
    22 loop = asyncio.get_event_loop()
    23 loop.run_until_complete(main())
    24 loop.close()

    运行直接报错:

    貌似fetch函数中不能返回response?百思不得姐,问题先放这,以后再解决吧

    3、根据上面ClientSession文档中介绍:

    请求头header应放在ClientSession实例化中

    4、aiohttp supports HTTP/HTTPS proxies

    但是,它根本就不支持 https 代理。

    可参考 Python3 异步代理爬虫池

     

    头疼,先写这么多吧

    最后尝试貌似代理ip又有问题,晕

  • 相关阅读:
    HashMap的存储原理
    HashSet的存储原理
    ArrayList的底层实现原理
    $.getJSON()不执行回调函数
    JavaScript学习笔记(一)
    【转】日语口语简略型总结(更新中。。。)
    计算机常用符号(日文)更新中。。。
    异常
    注解
    多线程
  • 原文地址:https://www.cnblogs.com/zwb8848happy/p/10473313.html
Copyright © 2011-2022 走看看