环境:Python3.x + Scrapy
升级pip(可忽略):
C:Usersxxx>python -m pip install --upgrade pip # 升级pip
Collecting pip
Using cached https://files.pythonhosted.org/packages/46/dc/7fd5df840efb3e56c8b4f768793a237ec4ee59891959d6a215d63f727023/pip-19.0.1-py2.py3-none-any.whl
Installing collected packages: pip
Found existing installation: pip 18.1
Uninstalling pip-18.1:
Successfully uninstalled pip-18.1
Successfully installed pip-19.0.1
1、新建Python3.x虚拟环境
C:Usersxxx>mkvirtualenv --python=C:UsersxxxAppDataLocalProgramsPythonPython37python.exe Py3_spider
python=
后面跟Python3的安装路径。
2、Scrapy安装
这里采用豆瓣镜像源下载安装:
C:Usersxxx>workon Py3_spider
(Py3_spider) C:Usersxxx>pip install -i https://pypi.douban.com/simple/ scrapy
... # 安装过程已省略
Successfully installed Automat-0.7.0 PyDispatcher-2.0.5 PyHamcrest-1.9.0 Twisted-18.9.0 asn1crypto-0.24.0 attrs-18.2.0 cffi-1.11.5 constantly-15.1.0 cryptography-2.5 cssselect-1.0.3 hyperlink-18.0.0 idna-2.8 incremental-17.5.0 lxml-4.3.0 parsel-1.5.1 pyOpenSSL-19.0.0 pyasn1-0.4.5 pyasn1-modules-0.2.4 pycparser-2.19 queuelib-1.5.0 scrapy-1.5.2 service-identity-18.1.0 six-1.12.0 w3lib-1.20.0 zope.interface-4.6.0
(Py3_spider) C:Usersxxx>
Python中查看Scrapy版本:
(Py3_spider) C:Usersxxx>python
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scrapy
>>> scrapy.version_info
(1, 5, 2)
>>> exit()
(Py3_spider) C:Usersxxx>
3、新建Scrapy项目
(Py3_spider) D:>mkdir SpiderProject
(Py3_spider) D:>cd SpiderProject
(Py3_spider) D:SpiderProject>scrapy startproject spider_pjt1
New Scrapy project 'spider_pjt1', using template directory 'c:\users\xxx\envs\py3_spider\lib\site-packages\scrapy\templates\project', created in:
D:SpiderProjectspider_pjt1
You can start your first spider with:
cd spider_pjt1
scrapy genspider example example.com
(Py3_spider) D:SpiderProject>
4、Scrapy目录结构
(Py3_spider) D:SpiderProject>tree /f spider_pjt1
D:SPIDERPROJECTSPIDER_PJT1
│ scrapy.cfg
│
└─spider_pjt1
│ items.py
│ middlewares.py
│ pipelines.py
│ settings.py
│ __init__.py
│
├─spiders
│ │ __init__.py
│ │
│ └─__pycache__
└─__pycache__
(Py3_spider) D:SpiderProject>