安装依赖:
- yum install libxslt-devel libffi libffi-devel python-devel gcc openssl openssl-devel sqlite-devel
安装Python2.7或以上的版本(如果多版本共存则必须加prefix)
- wget http://python.org/ftp/python/2.7.2/Python-2.7.2.tgz
- tar xvf Python-2.7.3.tgz
- cd Python-2.7.3
- ./configure --prefix=/usr/local/python27
- make && make install
安装setuptools和pip(可能需要添加PATH或者设置软链接)
- wget -q http://peak.telecommunity.com/dist/ez_setup.py
- python ez_setup.py
- easy_install pip
或者
- wget -q https://bootstrap.pypa.io/get-pip.py
- python get-pip.py
或者
- wget https://files.pythonhosted.org/packages/66/6d/dad0d39ce1cfa98ef3634463926e7324e342c956aecb066968e2e3696300/setuptools-30.0.0.tar.gz
- tar -xvf setuptools-30.0.0.tar.gz
- cd setuptools-30.0.0
- python setup.py install
- cd ..
- wget https://files.pythonhosted.org/packages/5e/53/eaef47e5e2f75677c9de0737acc84b659b78a71c4086f424f55346a341b5/pip-9.0.0.tar.gz
- tar -xvf pip-9.0.0.tar.gz
- cd pip-9.0.0
- python setup.py install
安装Twisted(可能需要添加PATH或者设置软链接)
- easy_install Twisted
- 可能Twisted版本过高或过低导致最后报错,可以用pip指定版本,多试几次
- pip install twisted==12.5.0
安装w3lib
- easy_install -U w3lib
安装lxml
- easy_install lxml
安装pyOpenSSL
- easy_install pyOpenSSL
- 如果不行则手动下载安装
- wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
- tar zxvf pyOpenSSL-0.11.tar.gz
- cd pyOpenSSL
- python2.7 setup.py install
安装Scrapy(可能需要添加PATH或者设置软链接)
- easy_install -U Scrapy
安装Selenium(如果需要解析动态网页)
- pip install selenium
安装PhantomJS(如果需要解析动态网页)
- wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
- bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
- tar xvf phantomjs-2.1.1-linux-x86_64.tar -C /usr/local/
- yum -y install wget fontconfig
- mv /usr/local/phantomjs-2.1.1-linux-x86_64/ /usr/local/phantomjs
- ln -s /usr/local/phantomjs/bin/phantomjs /usr/bin/
Scrapy测试
- scrapy shell www.baidu.com
Selenium和PhantomJS测试
from selenium import webdriver driver = webdriver.PhantomJS() driver.get("http://hotel.qunar.com/") data = driver.title print data
参考文献:
http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html
http://blog.csdn.net/diaoruiqing/article/details/8700533
http://blog.csdn.net/liuxiao723846/article/details/51477266
http://www.linuxidc.com/Linux/2016-11/137668.htm
https://stackoverflow.com/questions/42731760/attributeerror-module-object-has-no-attribute-op-no-tlsv1-1/43220861
http://www.cnblogs.com/zengguowang/p/6911812.html
http://blog.csdn.net/feifeilyj/article/details/52678011
http://www.cnblogs.com/zzhzhao/p/5380376.html
http://www.cnblogs.com/luxiaojun/p/6144748.html