zoukankan      html  css  js  c++  java
  • Centos 安装Python Scrapy PhantomJS

    安装依赖:

    • yum install libxslt-devel libffi libffi-devel python-devel gcc openssl openssl-devel sqlite-devel

    安装Python2.7或以上的版本(如果多版本共存则必须加prefix)

    • wget http://python.org/ftp/python/2.7.2/Python-2.7.2.tgz
    • tar xvf Python-2.7.3.tgz
    • cd Python-2.7.3
    • ./configure --prefix=/usr/local/python27 
    • make && make install

    安装setuptools和pip(可能需要添加PATH或者设置软链接)

    • wget -q http://peak.telecommunity.com/dist/ez_setup.py
    • python ez_setup.py
    • easy_install pip

    或者

    • wget -q https://bootstrap.pypa.io/get-pip.py
    • python get-pip.py

    或者

    • wget https://files.pythonhosted.org/packages/66/6d/dad0d39ce1cfa98ef3634463926e7324e342c956aecb066968e2e3696300/setuptools-30.0.0.tar.gz
    • tar -xvf setuptools-30.0.0.tar.gz
    • cd setuptools-30.0.0
    • python setup.py install
    • cd ..
    • wget https://files.pythonhosted.org/packages/5e/53/eaef47e5e2f75677c9de0737acc84b659b78a71c4086f424f55346a341b5/pip-9.0.0.tar.gz
    • tar -xvf pip-9.0.0.tar.gz
    • cd pip-9.0.0
    • python setup.py install

    安装Twisted(可能需要添加PATH或者设置软链接)

    • easy_install Twisted
    • 可能Twisted版本过高或过低导致最后报错,可以用pip指定版本,多试几次
    • pip install twisted==12.5.0

    安装w3lib

    • easy_install -U w3lib

     安装lxml

    • easy_install lxml

    安装pyOpenSSL

    • easy_install pyOpenSSL
    • 如果不行则手动下载安装
    • wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
    • tar zxvf pyOpenSSL-0.11.tar.gz
    • cd pyOpenSSL
    • python2.7 setup.py install

    安装Scrapy(可能需要添加PATH或者设置软链接)

    • easy_install -U Scrapy

    安装Selenium(如果需要解析动态网页)

    • pip install selenium

    安装PhantomJS(如果需要解析动态网页)

    • wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
    • bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
    • tar xvf phantomjs-2.1.1-linux-x86_64.tar -C /usr/local/
    • yum -y install wget fontconfig
    • mv /usr/local/phantomjs-2.1.1-linux-x86_64/ /usr/local/phantomjs
    • ln -s /usr/local/phantomjs/bin/phantomjs /usr/bin/

    Scrapy测试

    • scrapy shell www.baidu.com

    Selenium和PhantomJS测试

    from selenium import webdriver
    driver = webdriver.PhantomJS()
    driver.get("http://hotel.qunar.com/")
    data = driver.title
    print data

    参考文献:

    http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

    http://blog.csdn.net/diaoruiqing/article/details/8700533

    http://blog.csdn.net/liuxiao723846/article/details/51477266

    http://www.linuxidc.com/Linux/2016-11/137668.htm

    https://stackoverflow.com/questions/42731760/attributeerror-module-object-has-no-attribute-op-no-tlsv1-1/43220861

    http://www.cnblogs.com/zengguowang/p/6911812.html

    http://blog.csdn.net/feifeilyj/article/details/52678011

    http://www.cnblogs.com/zzhzhao/p/5380376.html

    http://www.cnblogs.com/luxiaojun/p/6144748.html

  • 相关阅读:
    lazy懒载入(延迟载入)UITableView
    POJ 3277 City Horizon
    Effective C++ Item 26 尽可能延后变量定义式的出现时间
    2014 百度之星题解1001
    搭建和測试Android JAVA NDK
    Oracle数据库案例整理-Oracle系统执行时故障-内存过少导致分配共享内存失败
    “以房养老”保险方案为啥行不通?
    Mysql上的RAC:Percona XtraDB Cluster负载均衡集群安装部署手冊
    mysql 数据库查询最后两条数据
    00109_反射概述
  • 原文地址:https://www.cnblogs.com/jhc888007/p/7463714.html
Copyright © 2011-2022 走看看