zoukankan      html  css  js  c++  java
  • Scrapy框架中结合splash 解析js ——环境配置

    环境配置:

    http://splash.readthedocs.io/en/stable/install.html

    pip install scrapy-splash
     

     service docker start

    docker pull scrapinghub/splash
    docker run -p 8050:8050 scrapinghub/splash
    ----

    settings.py

    #--
    SPLASH_URL = 'http://localhost:8050'
    #--
    DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    }
    #--
    SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
    }
    #--
    DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
    #--
    HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
    import scrapy
    from scrapy_splash import SplashRequest
    
    class MySpider(scrapy.Spider):
        start_urls = ["http://example.com", "http://example.com/foo"]
    
        def start_requests(self):
            for url in self.start_urls:
                yield SplashRequest(url, self.parse, args={'wait': 0.5})
    
        def parse(self, response):
            # response.body is a result of render.html call; it
            # contains HTML processed by a browser.
            # ...       
    参考链接: https://germey.gitbooks.io/python3webspider/content/7.2-Splash%E7%9A%84%E4%BD%BF%E7%94%A8.html
          http://blog.csdn.net/qq_23849183/article/details/51287935
          http://ae.yyuap.com/pages/viewpage.action?pageId=919763

      

  • 相关阅读:
    Java-LockSupport
    Kafka Eagle 安装
    Kafka shell
    python pip 使用
    Kafka 集群部署
    Kafka 概述
    DockerFile 简单使用
    《深入理解Java虚拟机》读书笔记
    linux安装redis
    Java多线程基础知识例子
  • 原文地址:https://www.cnblogs.com/fh-fendou/p/7612119.html
Copyright © 2011-2022 走看看