zoukankan      html  css  js  c++  java
  • scrapy笔记

    1、关于请求url状态码重定向问题:

    from scrapy import Request
    handle_httpstatus_list = [404, 403, 500, 503, 521, 522, 524,301,302]
    return Request(self.purl,headers=self.send_headers,meta={'dont_redirect':True}, callback=self.parse)
    if response.status in self.handle_httpstatus_list:
    print response.body
    print response.headers['Location'] # 重定向地址
    print response.url # 原始地址

    其中 scrapy 自带 Request 请求遇到302跳转不会继续,如果不接收302状态码的话,程序将不会执行到parse函数。如果不用settings中的日志去判断,你会郁闷死的:

    LOG_ENABLED = True
    LOG_ENCODING = 'utf-8'
    LOG_FILE = 'logging.log'
    LOG_LEVEL = 'DEBUG'
    # LOG_LEVEL = 'WARNING'
    LOG_STDOUT = False

    logging.log:
    2017-05-17 17:25:55 [scrapy] INFO: Spider opened
    2017-05-17 17:25:55 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
    2017-05-17 17:25:55 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
    2017-05-17 17:26:00 [scrapy] DEBUG: Crawled (302) <GET http://app.58.com/api/list/ershoufang/?tabkey=allcity&action=getListInfo&curVer=7.5.1&isNeedAd=0&ct=filter&os=ios&filterparams=%7B%22param1077%22:%221%22,%22filterLocal%22:%22rongchengqu%22%7D&appId=1&page=1&localname=jy> (referer: None)
    2017-05-17 17:26:00 [site58_sale] DEBUG: Read 1 requests from 'site58_sale'




  • 相关阅读:
    战胜忧虑<2>——忙碌可以消除忧虑
    战胜忧虑<1>——不要让忧郁侵入你的生活
    Django的下载和安装
    Github 如何上传本地文件
    Python_相对路径的获取
    Python_生成HTMLTestRunner测试报告
    Python_requests实例
    Charles抓包(Http/Https请求)
    Python_base_函数返回值
    Python_base_print 取消自动换行
  • 原文地址:https://www.cnblogs.com/haoxr/p/6868667.html
Copyright © 2011-2022 走看看