zoukankan      html  css  js  c++  java
  • scrapy的splash 的简单使用

    安装Splash(拉取镜像下来)
    docker pull scrapinghub/splash
    安装scrapy-splash
    pip install scrapy-splash
    启动容器
    docker run -p 8050:8050 scrapinghub/splash
    setting 里面配置
    SPLASH_URL = 'http://192.168.99.100:8050' #(很重要写错了会出目标电脑积极拒绝)
    添加Splash中间件,指定优先级
    DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    }
    设置Splash自己的去重过滤器
    DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
    缓存后台存储介质
    HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage' # 以上两条必加
    eg:
    import scrapy
    from scrapy_splash import SplashRequest
    class JsSpider(scrapy.Spider):
    name = "jd"
    allowed_domains = ["jd.com"]
    start_urls = [
    "http://www.jd.com/"
    ]
    def start_requests(self):
    for url in self.start_urls:
    yield SplashRequest(url, self.parse, args={'wait': 0.5})
    def parse(self, response):
    print('----------使用splash爬取京东网首页异步加载内容-----------')
    rs=response.xpath('//span[@class="ui-areamini-text"]/text()').extract()[0]
    print(rs)
    print('---------------success----------------')
    官方文档:https://pypi.python.org/pypi/scrapy-splash

  • 相关阅读:
    linux下安装elasticsearch5.6.3
    linux下安装git
    环境安装备忘录 Zookeeper
    环境安装备忘录 JDK
    环境安装备忘录 Tomcat
    MySql 通过show status 优化数据库性能
    MySQL执行计划解读 转他人文章
    2015年12月21日 my.cnf 配置
    mysql 如何查看my.cnf的 位置
    mysql状态查看 QPS/TPS/缓存命中率查看
  • 原文地址:https://www.cnblogs.com/qieyu/p/8024822.html
Copyright © 2011-2022 走看看