zoukankan      html  css  js  c++  java
  • scrapy工具创建爬虫工程

    1、scrapy创建爬虫工程:scrapy startproject scrape_project_name

    >scrapy startproject books_scrape
    New Scrapy project 'books_scrape', using template directory 's:\users\jiangshan\anaconda3\lib\site-packages\scrapy\templates\project', created in:
    D:WorkspaceScrapyTestooks_scrape

    You can start your first spider with:
    cd books_scrape
    scrapy genspider example example.com

    2、>cd books_scrape

    3、查看目录结构:>tree /F

    >tree /F
    卷 DATA1 的文件夹 PATH 列表
    卷序列号为 3A2E-EB05
    D:.
    │ scrapy.cfg

    └─books_scrape
    │ items.py
    │ middlewares.py
    │ pipelines.py
    │ settings.py
    │ __init__.py

    ├─spiders
    │ │ __init__.py
    │ │
    │ └─__pycache__
    └─__pycache__

    4、使用scrapy genspider<SPIDER_NAME> <DOMAIN> 命令生成(根据模板)和创建Spider文件以及Spider类,该命令的两个参数分别是Spider的名字和所要爬取的域(网站)

    > scrapy genspider books  books.toscrape.com

    5、查看目录结构:(标蓝色先不管,因为本人使用远程服务器调试)

    >tree /F

    D:.
    │ scrapy.cfg

    └─books_scrape
    │ items.py
    │ middlewares.py
    │ pipelines.py
    │ run.py
    │ settings.py
    │ __init__.py

    ├─.idea
    │ books_scrape.iml
    │ deployment.xml
    │ misc.xml
    │ modules.xml
    │ remote-mappings.xml
    │ workspace.xml

    ├─spiders
    │ │ books.py
    │ │ __init__.py
    │ │
    │ └─__pycache__
    │ __init__.cpython-37.pyc

    └─__pycache__
    settings.cpython-37.pyc
    __init__.cpython-37.pyc

    6、打开pycharm软件,打开创建的books_scrape工程,以配置文件scrapy.cfg为基准

    7、在和├─spiders同级目录新建,run.py文件,写入:

    from scrapy import cmdline
    cmdline.execute('scrapy crawl books'.split())



    cmdline.execute('scrapy crawl books -o %(name)s%(time)s.csv'.split())


    cmdline.execute('scrapy crawl books -o books.csv'.split())

    cmdline.execute('scrapy crawl books -o books.xml'.split())


  • 相关阅读:
    Max History CodeForces
    Buy a Ticket CodeForces
    AC日记——字符串的展开 openjudge 1.7 35
    AC日记——回文子串 openjudge 1.7 34
    AC日记——判断字符串是否为回文 openjudge 1.7 33
    AC日记——行程长度编码 openjudge 1.7 32
    AC日记——字符串P型编码 openjudge 1.7 31
    AC日记——字符环 openjudge 1.7 30
    AC日记——ISBN号码 openjudge 1.7 29
    AC日记——单词倒排 1.7 28
  • 原文地址:https://www.cnblogs.com/jeshy/p/11105766.html
Copyright © 2011-2022 走看看