zoukankan      html  css  js  c++  java
  • scrapy的中间件(下载中间件)

    # 下载中间件
    -process_request:返回不同的对象,后续处理不同(加代理...)
    
    
    class CnblogsDownloaderMiddleware:
        @classmethod
        def from_crawler(cls, crawler):
            pass
        def process_request(self, request, spider):
            # Called for each request that goes through the downloader
            # middleware.
    
            # Must either:
            # - return None: continue processing this request
            # - or return a Response object
            # - or return a Request object
            # - or raise IgnoreRequest: process_exception() methods of
            #   installed downloader middleware will be called
    
            # 1 更换请求头from scrapy.http.headers import Headers
            # 方式一:
            # request.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
    
            # 方式二:使用fake-useragent
            # pip3 install fake-useragent
            # from fake_useragent import UserAgent
            # request.headers['User-Agent'] = UserAgent().random
            # print(request.headers)
    
            # 2 加cookie ---cookie池
            # 假设你你已经搭建好cookie 池了,
            # print('00000--',request.cookies)
            # request.cookies={'username':'asdfasdf'}
    
            # 3 加代理
            # print(request.meta)
            # request.meta['download_timeout'] = 20
            # request.meta["proxy"] = 'http://218.22.7.62:53281'
  • 相关阅读:
    安装devstack之配置proxy
    设备信息表项目
    好的运维工程师
    rhel 6.4 增加光盘为yum repo
    深度运维产品工具关键词
    坚持是一种能力
    书单 电影单 电视剧单
    三日不读书,便觉得言语无味,面目可憎
    STAR法则
    【断舍离】
  • 原文地址:https://www.cnblogs.com/baicai37/p/13449394.html
Copyright © 2011-2022 走看看