zoukankan      html  css  js  c++  java
  • Centos 安装Python Scrapy PhantomJS

    安装依赖:

    • yum install libxslt-devel libffi libffi-devel python-devel gcc openssl openssl-devel sqlite-devel

    安装Python2.7或以上的版本(如果多版本共存则必须加prefix)

    • wget http://python.org/ftp/python/2.7.2/Python-2.7.2.tgz
    • tar xvf Python-2.7.3.tgz
    • cd Python-2.7.3
    • ./configure --prefix=/usr/local/python27 
    • make && make install

    安装setuptools和pip(可能需要添加PATH或者设置软链接)

    • wget -q http://peak.telecommunity.com/dist/ez_setup.py
    • python ez_setup.py
    • easy_install pip

    或者

    • wget -q https://bootstrap.pypa.io/get-pip.py
    • python get-pip.py

    或者

    • wget https://files.pythonhosted.org/packages/66/6d/dad0d39ce1cfa98ef3634463926e7324e342c956aecb066968e2e3696300/setuptools-30.0.0.tar.gz
    • tar -xvf setuptools-30.0.0.tar.gz
    • cd setuptools-30.0.0
    • python setup.py install
    • cd ..
    • wget https://files.pythonhosted.org/packages/5e/53/eaef47e5e2f75677c9de0737acc84b659b78a71c4086f424f55346a341b5/pip-9.0.0.tar.gz
    • tar -xvf pip-9.0.0.tar.gz
    • cd pip-9.0.0
    • python setup.py install

    安装Twisted(可能需要添加PATH或者设置软链接)

    • easy_install Twisted
    • 可能Twisted版本过高或过低导致最后报错,可以用pip指定版本,多试几次
    • pip install twisted==12.5.0

    安装w3lib

    • easy_install -U w3lib

     安装lxml

    • easy_install lxml

    安装pyOpenSSL

    • easy_install pyOpenSSL
    • 如果不行则手动下载安装
    • wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
    • tar zxvf pyOpenSSL-0.11.tar.gz
    • cd pyOpenSSL
    • python2.7 setup.py install

    安装Scrapy(可能需要添加PATH或者设置软链接)

    • easy_install -U Scrapy

    安装Selenium(如果需要解析动态网页)

    • pip install selenium

    安装PhantomJS(如果需要解析动态网页)

    • wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
    • bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
    • tar xvf phantomjs-2.1.1-linux-x86_64.tar -C /usr/local/
    • yum -y install wget fontconfig
    • mv /usr/local/phantomjs-2.1.1-linux-x86_64/ /usr/local/phantomjs
    • ln -s /usr/local/phantomjs/bin/phantomjs /usr/bin/

    Scrapy测试

    • scrapy shell www.baidu.com

    Selenium和PhantomJS测试

    from selenium import webdriver
    driver = webdriver.PhantomJS()
    driver.get("http://hotel.qunar.com/")
    data = driver.title
    print data

    参考文献:

    http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

    http://blog.csdn.net/diaoruiqing/article/details/8700533

    http://blog.csdn.net/liuxiao723846/article/details/51477266

    http://www.linuxidc.com/Linux/2016-11/137668.htm

    https://stackoverflow.com/questions/42731760/attributeerror-module-object-has-no-attribute-op-no-tlsv1-1/43220861

    http://www.cnblogs.com/zengguowang/p/6911812.html

    http://blog.csdn.net/feifeilyj/article/details/52678011

    http://www.cnblogs.com/zzhzhao/p/5380376.html

    http://www.cnblogs.com/luxiaojun/p/6144748.html

  • 相关阅读:
    HDU 1124 Factorial
    hdu 1690 Bus System
    hdu 1113 Word Amalgamation
    POJ 2482 Stars in Your Window
    hdu 1385 ZOJ 1456 Minimum Transport Cost(经典floyd)
    hdu 1907 John
    VMware 虚拟机 安装 UBuntu 9.10 命令模式转换成窗口模试
    #pragma CODE_SEG __NEAR_SEG NON_BANKED详解
    Ubuntu 下Hadoop 伪分布式 hadoop0.20.2.tar.gz 的安装
    文件拷贝代码以及疑问
  • 原文地址:https://www.cnblogs.com/jhc888007/p/7463714.html
Copyright © 2011-2022 走看看