zoukankan      html  css  js  c++  java
  • Scrapy怎样同时运行多个爬虫?

      默认情况下,当你运行 scrapy crawl 命令的时候,scrapy只能在单个进程里面运行一个爬虫。然后Scrapy运行方式除了采用命令行式的运行方式以外还可以使用API的方式来运行爬虫,而采用API的方式运行的爬虫是支持运行多个爬虫的。

      下面的案例是运行多个爬虫:

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    class MySpider1(scrapy.Spider):
        # Your first spider definition
        ...
    
    class MySpider2(scrapy.Spider):
        # Your second spider definition
        ...
    
    process = CrawlerProcess() # 初始化事件循环
    process.crawl(MySpider1) # 将爬虫类方式事件循环
    process.crawl(MySpider2) # 将爬虫类方式事件循环
    process.start() # the script will block here until all crawling jobs are finished
    

      此外采用 CrawlerRunner 也是可行的:

    import scrapy
    from twisted.internet import reactor
    from scrapy.crawler import CrawlerRunner
    from scrapy.utils.log import configure_logging
    
    class MySpider1(scrapy.Spider):
        # Your first spider definition
        ...
    
    class MySpider2(scrapy.Spider):
        # Your second spider definition
        ...
    
    configure_logging()
    runner = CrawlerRunner()
    runner.crawl(MySpider1)
    runner.crawl(MySpider2)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())
    
    reactor.run() # the script will block here until all crawling jobs are finished
    

      deferreds的方式来运行:

    from twisted.internet import reactor, defer
    from scrapy.crawler import CrawlerRunner
    from scrapy.utils.log import configure_logging
    
    class MySpider1(scrapy.Spider):
        # Your first spider definition
        ...
    
    class MySpider2(scrapy.Spider):
        # Your second spider definition
        ...
    
    configure_logging()
    runner = CrawlerRunner()
    
    @defer.inlineCallbacks
    def crawl():
        yield runner.crawl(MySpider1)
        yield runner.crawl(MySpider2)
        reactor.stop()
    
    crawl()
    reactor.run() # the script will block here until the last crawl call is finished
    

      更多细节参考:

           Scrapy文档

  • 相关阅读:
    11.组件-组件中的data和methods
    09.组件-创建组件的方式2
    10.组件-创建组件的方式3
    07.动画-列表动画
    08.组件-创建组件的方式1
    关于苹果iBeacon官方文档解析
    IOS-代码书写规范
    IOS- 1970ms时间计算
    关于tableview顶部留白问题
    IOS-网络请求数据解析道数组程序崩溃问题
  • 原文地址:https://www.cnblogs.com/renshaoqi/p/11177166.html
Copyright © 2011-2022 走看看