zoukankan      html  css  js  c++  java
  • scrapy相关:splash安装 A javascript rendering service 渲染

    0.

    splash: 美人鱼  溅,泼 

    1.参考

    Splash使用初体验 

    docker在windows下的安装

    https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/

    Splash is our in-house solution for JavaScript rendering, implemented in Python using Twisted and QT.  官方博客介绍,splash 是 scrapinghub 的内部解决方案???

    https://scrapinghub.com/ 

    We're the creators and the main maintainers of Scrapy. 创始人和维护者...背后的大佬

    github: scrapinghub/splash

    Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5.

    It's fast, lightweight and state-less which makes it easy to distribute. 用于渲染js页面

    http://splash.readthedocs.io/en/latest/index.html

    splash 官方文档

    github: scrapy-plugins/scrapy-splash

    This library provides Scrapy and JavaScript integration using Splash. 如何在 scrapy 中使用 splash

    http://splash.readthedocs.io/en/stable/api.html#request-filters  

    Splash supports filtering requests based on Adblock Plus rules.  还没有搞定

    2.安装使用

    https://stackoverflow.com/questions/30345623/scraping-dynamic-content-using-python-scrapy

    提到 ScrapyJS,但是链接地址跳转 https://github.com/scrapy-plugins/scrapy-splash#installation

    https://pypi.python.org/pypi/scrapyjs

    https://pypi.python.org/pypi/scrapy-splash

    2.1 安装 scrapy-splash

    C:Userswin7>pip install scrapy-splash
    Collecting scrapy-splash
      Downloading scrapy_splash-0.7.2-py2.py3-none-any.whl
    Installing collected packages: scrapy-splash
    Successfully installed scrapy-splash-0.7.2

    2.2 通过 docker 安装 image:scrapinghub/splash

    官网找到下载链接

    https://store.docker.com/editions/community/docker-ce-desktop-windows

    Get Docker Community Edition for Windows

    Docker for Windows is available for free.

    Requires Microsoft Windows 10 Professional or Enterprise 64-bit. For previous versions get Docker Toolbox.

    右键管理员安装,最好勾选非必要项???

    右键管理员启动 Docker Quickstart Terminal ,提示没找到 bash.exe

    输出:

    Creating CA: C:Userswin7.dockermachinecertsca.pem
    Creating client certificate: C:Userswin7.dockermachinecertscert.pem
    Running pre-create checks...
    (default) Image cache directory does not exist, creating it at C:Userswin7.dockermachinecache...
    (default) No default Boot2Docker ISO found locally, downloading the latest release...
    (default) Latest release for github.com/boot2docker/boot2docker is v17.09.0-ce
    (default) Downloading C:Userswin7.dockermachinecacheoot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v17.09.0-ce/boot2docker.iso...
    (default) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
    Creating machine...
    (default) Copying C:Userswin7.dockermachinecacheoot2docker.iso to C:Userswin7.dockermachinemachinesdefaultoot2docker.iso...
    (default) Creating VirtualBox VM...
    (default) Creating SSH key...
    (default) Starting the VM...
    (default) Check network to re-create if needed...
    (default) Windows might ask for the permission to create a network adapter. Sometimes, such confirmation window is minimized in the taskbar.
    (default) Found a new host-only adapter: "VirtualBox Host-Only Ethernet Adapter #2"
    (default) Windows might ask for the permission to configure a network adapter. Sometimes, such confirmation window is minimized in the taskbar.
    (default) Windows might ask for the permission to configure a dhcp server. Sometimes, such confirmation window is minimized in the taskbar.
    (default) Waiting for an IP...
    Waiting for machine to be running, this may take a few minutes...
    Detecting operating system of created instance...
    Waiting for SSH to be available...
    Detecting the provisioner...
    Provisioning with boot2docker...
    Copying certs to the local machine directory...
    Copying certs to the remote machine...
    Setting Docker configuration on the remote daemon...
    Checking connection to Docker...
    Docker is up and running!
    To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: D:Program FilesDocker Toolboxdocker-machine.exe env default
    
    
    
                            ##         .
                      ## ## ##        ==
                   ## ## ## ## ##    ===
               /"""""""""""""""""\___/ ===
          ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
               \______ o           __/
                              __/
                  \____\_______/
    
    docker is configured to use the default machine with IP 192.168.99.100
    For help getting started, check out the docs at https://docs.docker.com
    
    Start interactive shell
    
    win7@win7-PC MINGW64 ~
    $ docker info
    Containers: 0
     Running: 0
     Paused: 0
     Stopped: 0
    Images: 0
    Server Version: 17.09.0-ce
    Storage Driver: aufs
     Root Dir: /mnt/sda1/var/lib/docker/aufs
     Backing Filesystem: extfs
     Dirs: 0
     Dirperm1 Supported: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
     Volume: local
     Network: bridge host macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
    Swarm: inactive
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
    runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
    init version: 949e6fa
    Security Options:
     seccomp
      Profile: default
    Kernel Version: 4.4.89-boot2docker
    Operating System: Boot2Docker 17.09.0-ce (TCL 7.2); HEAD : 06d5c35 - Wed Sep 27 23:22:43 UTC 2017
    OSType: linux
    Architecture: x86_64
    CPUs: 1
    Total Memory: 995.8MiB
    Name: default
    ID: O33J:6GDF:AQ6P:RBM7:6KLF:OZHY:2N3J:QZKV:YIJT:G3AI:XCPD:NZ3G
    Docker Root Dir: /mnt/sda1/var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): true
     File Descriptors: 17
     Goroutines: 26
     System Time: 2017-10-18T09:58:42.414047781Z
     EventsListeners: 0
    Registry: https://index.docker.io/v1/
    Labels:
     provider=virtualbox
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false
    
    
    win7@win7-PC MINGW64 ~
    $ ipconfig
    
    Windows IP 配置
    
    
    以太网适配器 lan:
    
       连接特定的 DNS 后缀 . . . . . . . :
       本地链接 IPv6 地址. . . . . . . . : fe80::f950:bf55:726b:b7a6%14
       IPv4 地址 . . . . . . . . . . . . : 192.168.144.100
       子网掩码  . . . . . . . . . . . . : 255.255.255.0
       默认网关. . . . . . . . . . . . . : 192.168.144.254
    
    以太网适配器 VirtualBox Host-Only Network #2:
    
       连接特定的 DNS 后缀 . . . . . . . :
       本地链接 IPv6 地址. . . . . . . . : fe80::1c18:13ad:7ed2:c0ff%29
       IPv4 地址 . . . . . . . . . . . . : 192.168.99.1
       子网掩码  . . . . . . . . . . . . : 255.255.255.0
       默认网关. . . . . . . . . . . . . :
    
    隧道适配器 isatap.{CE007B04-2C7A-4A52-8BBF-1BCB4682EEB9}:
    
       媒体状态  . . . . . . . . . . . . : 媒体已断开
       连接特定的 DNS 后缀 . . . . . . . :
    
    隧道适配器 Teredo Tunneling Pseudo-Interface:
    
       媒体状态  . . . . . . . . . . . . : 媒体已断开
       连接特定的 DNS 后缀 . . . . . . . :
    
    隧道适配器 isatap.{93C68FD9-301C-484C-AFCB-5549CA24453B}:
    
       媒体状态  . . . . . . . . . . . . : 媒体已断开
       连接特定的 DNS 后缀 . . . . . . . :
    
    win7@win7-PC MINGW64 ~
    $
    View Code

    里面重要信息:

    (default) Copying C:Userswin7.dockermachinecacheoot2docker.iso to C:Userswin7.dockermachinemachinesdefaultoot2docker.iso...
    (default) Creating VirtualBox VM...
    
    docker is configured to use the default machine with IP 192.168.99.100
    For help getting started, check out the docs at https://docs.docker.com

     putty 连接:

    192.168.99.100
    22
    
    docker
    tcuser

     第一次需要从docker hub下载相关镜像文件

    sudo docker pull scrapinghub/splash

     后面每次启动splash服务,并通过http,https,telnet提供服务

    #通常一般使用http模式 ,可以只启动一个8050就好  
    #Splash 将运行在 0.0.0.0 at ports 8050 (http), 8051 (https) and 5023 (telnet).
    sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash

     浏览器打开

    http://192.168.99.100:8050

  • 相关阅读:
    第11组 Beta冲刺(1/5)
    第11组 Alpha事后诸葛亮
    第11组 Alpha冲刺(6/6)
    第11组 Alpha冲刺(5/6)
    软工实践个人总结
    第11组 Beta版本演示
    第11组 Beta冲刺(5/5)
    第11组 Beta冲刺(4/5)
    第11组 Beta冲刺(3/5)
    第11组 Beta冲刺(2/5)
  • 原文地址:https://www.cnblogs.com/my8100/p/splash_install.html
Copyright © 2011-2022 走看看