zoukankan      html  css  js  c++  java
  • python-scrapy-中间件的学习

    middlewares.py


    class MiddlewareDownloaderMiddleware:

    @classmethod
    def from_crawler(cls, crawler):
    # This method is used by Scrapy to create your spiders.
    s = cls()
    crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
    return s

    def process_request(self, request, spider):
    # spider就是爬虫类的实例化对象
    # spider.name
    # 拦截所有的请求对象,包括正常与不正常
    # 参数:request就是请求到的对象
    # 获取或者修改请求头信息
    # request.headers['Cookie'] = 'xxx'
    print('i am process_request ')
    return None

    def process_response(self, request, response, spider):
    # 拦截所有的响应对象
    # 参数:response就是响应对象
    print('i am process_response ')
    return response

    def process_exception(self, request, exception, spider):
    # 拦截发生异常的请求对象
    # 需要对异常的请求进行修正,然后将其重新发送即可
    print('i am process_exception ')
    # 代理操作
    # request.meta['proxy'] = 'https://ip:port'
    return request

    settings.py 开启中间件
    DOWNLOADER_MIDDLEWARES = {
    'middleware.middlewares.MiddlewareDownloaderMiddleware': 543,
    }
  • 相关阅读:
    队列分类梳理
    停止线程
    Docker和Kubernetes
    Future、Callback、Promise
    Static、Final、static final
    线程池梳理
    TCP四次挥手
    http1.0、http1.x、http 2和https梳理
    重排序
    java内存模型梳理
  • 原文地址:https://www.cnblogs.com/shiyi525/p/14274418.html
Copyright © 2011-2022 走看看