zoukankan      html  css  js  c++  java
  • scrapy的splash 的简单使用

    安装Splash(拉取镜像下来)
    docker pull scrapinghub/splash
    安装scrapy-splash
    pip install scrapy-splash
    启动容器
    docker run -p 8050:8050 scrapinghub/splash
    setting 里面配置
    SPLASH_URL = 'http://192.168.99.100:8050' #(很重要写错了会出目标电脑积极拒绝)
    添加Splash中间件,指定优先级
    DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    }
    设置Splash自己的去重过滤器
    DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
    缓存后台存储介质
    HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage' # 以上两条必加
    eg:
    import scrapy
    from scrapy_splash import SplashRequest
    class JsSpider(scrapy.Spider):
    name = "jd"
    allowed_domains = ["jd.com"]
    start_urls = [
    "http://www.jd.com/"
    ]
    def start_requests(self):
    for url in self.start_urls:
    yield SplashRequest(url, self.parse, args={'wait': 0.5})
    def parse(self, response):
    print('----------使用splash爬取京东网首页异步加载内容-----------')
    rs=response.xpath('//span[@class="ui-areamini-text"]/text()').extract()[0]
    print(rs)
    print('---------------success----------------')
    官方文档:https://pypi.python.org/pypi/scrapy-splash

  • 相关阅读:
    php redis操作
    textarea 文本框根据输入内容自适应高度
    ThinkPHP5 微信接口对接公共类
    ThinkPHP5 excel 导入/导出
    NGUI 学习使用
    Unity3d 背景、音效 播放 简单demo
    Unity3D教程:制作与载入AssetBundle
    BuildPipeline.BuildAssetBundle 编译资源包
    C# 如何将对象写入文件
    unity3d IO操作
  • 原文地址:https://www.cnblogs.com/qieyu/p/8024822.html
Copyright © 2011-2022 走看看