zoukankan      html  css  js  c++  java
  • Scrapy--1安装和运行

    1.Scrapy安装问题

    一开始是按照官方文档上直接用pip安装的,创建项目的时候并没有报错,

    然而在运行 scrapy crawl dmoz 的时候错误百粗/(ㄒoㄒ)/~~比如:

    ImportError: No module named _cffi_backend

    Unhandled error in Deferred 等等,发现是依赖包好多没有装上,就去百度安装各种包,
    有好多大神把这些都总结好了:膜拜!^_^

     http://blog.csdn.net/niying/article/details/27103081

    http://blog.csdn.net/pleasecallmewhy/article/details/19354723

    2.没有得到数据,发现是拼写错误.

    E:	utorial>scrapy crawl dmoz
    2015-10-30 13:44:02 [scrapy] INFO: Scrapy 1.0.3 started (bot: tutorial)
    2015-10-30 13:44:02 [scrapy] INFO: Optional features available: ssl, http11
    2015-10-30 13:44:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu
    torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
    
    2015-10-30 13:44:02 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol
    e, LogStats, CoreStats, SpiderState
    2015-10-30 13:44:03 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddl
    eware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultH
    eadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMidd
    leware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
    2015-10-30 13:44:03 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddlewa
    re, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
    2015-10-30 13:44:03 [scrapy] INFO: Enabled item pipelines:
    2015-10-30 13:44:03 [scrapy] INFO: Spider opened
    2015-10-30 13:44:03 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 i
    tems (at 0 items/min)
    2015-10-30 13:44:03 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
    2015-10-30 13:44:03 [scrapy] INFO: Closing spider (finished)
    2015-10-30 13:44:03 [scrapy] INFO: Dumping Scrapy stats:
    {'finish_reason': 'finished',
     'finish_time': datetime.datetime(2015, 10, 30, 5, 44, 3, 292000),
     'log_count/DEBUG': 1,
     'log_count/INFO': 7,
     'start_time': datetime.datetime(2015, 10, 30, 5, 44, 3, 282000)}
    2015-10-30 13:44:03 [scrapy] INFO: Spider closed (finished)

     在spiders目录下的dmoz_spiders.py文件中将start_urls写成了start_url ,哎,╮(╯▽╰)╭

    1 start_urls = [
    2         "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
    3         "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    4     ]
  • 相关阅读:
    idea快捷方式2
    idea快捷方式1
    小程序数组倒序
    小程序列表倒计时 wxs 实现
    cnpm
    小程序的空判断
    小程序正则写法
    Sphinx中文入门指南——新手可先看此文
    sphinx –rotate机制详解
    sphinx数据文件简析
  • 原文地址:https://www.cnblogs.com/RoundGirl/p/4920426.html
Copyright © 2011-2022 走看看