zoukankan      html  css  js  c++  java
  • scrapy的中间件(下载中间件)

    # 下载中间件
    -process_request:返回不同的对象,后续处理不同(加代理...)
    
    
    class CnblogsDownloaderMiddleware:
        @classmethod
        def from_crawler(cls, crawler):
            pass
        def process_request(self, request, spider):
            # Called for each request that goes through the downloader
            # middleware.
    
            # Must either:
            # - return None: continue processing this request
            # - or return a Response object
            # - or return a Request object
            # - or raise IgnoreRequest: process_exception() methods of
            #   installed downloader middleware will be called
    
            # 1 更换请求头from scrapy.http.headers import Headers
            # 方式一:
            # request.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
    
            # 方式二:使用fake-useragent
            # pip3 install fake-useragent
            # from fake_useragent import UserAgent
            # request.headers['User-Agent'] = UserAgent().random
            # print(request.headers)
    
            # 2 加cookie ---cookie池
            # 假设你你已经搭建好cookie 池了,
            # print('00000--',request.cookies)
            # request.cookies={'username':'asdfasdf'}
    
            # 3 加代理
            # print(request.meta)
            # request.meta['download_timeout'] = 20
            # request.meta["proxy"] = 'http://218.22.7.62:53281'
  • 相关阅读:
    显示器面板参数
    解决SQL Server 2008安装时提示:重新启动计算机 失败
    SQL Server 的 TSQL 语句的性能评估方法
    判断字母大小写
    linux发展史简介
    下载route命令源码
    TCP糊涂窗口综合症
    QT显示中文
    TCP四个定时器 之 TCP坚持定时器
    android 去ListView滑动阴影
  • 原文地址:https://www.cnblogs.com/baicai37/p/13449394.html
Copyright © 2011-2022 走看看