zoukankan      html  css  js  c++  java
  • Scrapy怎样同时运行多个爬虫?

      默认情况下,当你运行 scrapy crawl 命令的时候,scrapy只能在单个进程里面运行一个爬虫。然后Scrapy运行方式除了采用命令行式的运行方式以外还可以使用API的方式来运行爬虫,而采用API的方式运行的爬虫是支持运行多个爬虫的。

      下面的案例是运行多个爬虫:

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    class MySpider1(scrapy.Spider):
        # Your first spider definition
        ...
    
    class MySpider2(scrapy.Spider):
        # Your second spider definition
        ...
    
    process = CrawlerProcess() # 初始化事件循环
    process.crawl(MySpider1) # 将爬虫类方式事件循环
    process.crawl(MySpider2) # 将爬虫类方式事件循环
    process.start() # the script will block here until all crawling jobs are finished
    

      此外采用 CrawlerRunner 也是可行的:

    import scrapy
    from twisted.internet import reactor
    from scrapy.crawler import CrawlerRunner
    from scrapy.utils.log import configure_logging
    
    class MySpider1(scrapy.Spider):
        # Your first spider definition
        ...
    
    class MySpider2(scrapy.Spider):
        # Your second spider definition
        ...
    
    configure_logging()
    runner = CrawlerRunner()
    runner.crawl(MySpider1)
    runner.crawl(MySpider2)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())
    
    reactor.run() # the script will block here until all crawling jobs are finished
    

      deferreds的方式来运行:

    from twisted.internet import reactor, defer
    from scrapy.crawler import CrawlerRunner
    from scrapy.utils.log import configure_logging
    
    class MySpider1(scrapy.Spider):
        # Your first spider definition
        ...
    
    class MySpider2(scrapy.Spider):
        # Your second spider definition
        ...
    
    configure_logging()
    runner = CrawlerRunner()
    
    @defer.inlineCallbacks
    def crawl():
        yield runner.crawl(MySpider1)
        yield runner.crawl(MySpider2)
        reactor.stop()
    
    crawl()
    reactor.run() # the script will block here until the last crawl call is finished
    

      更多细节参考:

           Scrapy文档

  • 相关阅读:
    Entity Framework 连接低版本数据库
    Validate Disk Failover Failed
    Unable to create Azure Mobile Service: Error 500
    查看Visual Studio异常内容
    RDLC An unexpected error occurred while compiling expressions. Native compiler return value: '-1073741511'
    Redis 64 steps
    IQueryable join 的问题
    jquery 通知页面变化
    jquery 让滚动条处于div底部
    SSIS 文件系统任务无法使用变量配置目标路径
  • 原文地址:https://www.cnblogs.com/renshaoqi/p/11177166.html
Copyright © 2011-2022 走看看