zoukankan      html  css  js  c++  java
  • scrapy 自定义扩展

    1、新建一个扩展文件,定义一个类,必须包含from_crawler方法:

    from scrapy import signals
    
    
    class MyExtend:
    
        def __init__(self, crawler):
            self.crawler = crawler
            # 给钩子挂操作
            crawler.signals.connect(self.start, signals.engine_started)
    
        @classmethod
        def from_crawler(cls, crawler):
            return cls(crawler)
    
        def start(self):
            # 自定义操作
            print('signals.engine_started')

    2、设置settings

    EXTENSIONS = {
        'day96.extensions.MyExtend': 300,
    }

    3、可以挂钩子的地方

    # 引擎开始运行的时候
    engine_started = object()
    # 引擎结束运行的时候
    engine_stopped = object()
    
    spider_opened = object()
    spider_idle = object()
    spider_closed = object()
    spider_error = object()
    request_scheduled = object()
    request_dropped = object()
    response_received = object()
    response_downloaded = object()
    
    # yield Item的时候
    item_scraped = object()
    # Item丢弃的时候
    item_dropped = object()
  • 相关阅读:
    每日日报30
    每日作业报告
    每日作业报告
    每日作业报告
    每日作业报告
    每日作业报告
    每日作业报告
    每日作业报告
    每日作业报告
    Java学习的第四十三天
  • 原文地址:https://www.cnblogs.com/trunkslisa/p/9814764.html
Copyright © 2011-2022 走看看