zoukankan      html  css  js  c++  java
  • Scrapy框架中结合splash 解析js ——环境配置

    环境配置:

    http://splash.readthedocs.io/en/stable/install.html

    pip install scrapy-splash
     

     service docker start

    docker pull scrapinghub/splash
    docker run -p 8050:8050 scrapinghub/splash
    ----

    settings.py

    #--
    SPLASH_URL = 'http://localhost:8050'
    #--
    DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    }
    #--
    SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
    }
    #--
    DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
    #--
    HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
    import scrapy
    from scrapy_splash import SplashRequest
    
    class MySpider(scrapy.Spider):
        start_urls = ["http://example.com", "http://example.com/foo"]
    
        def start_requests(self):
            for url in self.start_urls:
                yield SplashRequest(url, self.parse, args={'wait': 0.5})
    
        def parse(self, response):
            # response.body is a result of render.html call; it
            # contains HTML processed by a browser.
            # ...       
    参考链接: https://germey.gitbooks.io/python3webspider/content/7.2-Splash%E7%9A%84%E4%BD%BF%E7%94%A8.html
          http://blog.csdn.net/qq_23849183/article/details/51287935
          http://ae.yyuap.com/pages/viewpage.action?pageId=919763

      

  • 相关阅读:
    JavaScript的兼容小坑和调试小技巧
    前端jQuery实现瀑布流
    angular常用属性大全
    Eclipse易卡死
    工作反思
    半年回忆
    努力做到
    产品经理如何应对技术的「做不了」这样的问题(转)
    优秀的产品经理我还有多远
    简历技巧
  • 原文地址:https://www.cnblogs.com/fh-fendou/p/7612119.html
Copyright © 2011-2022 走看看