zoukankan      html  css  js  c++  java
  • Scrapy shell使用

    注意:容易出现403错误,实际爬取时不会出现。
    response - a Response object containing the last fetched page
    >>>response.xpath('//title/text()').extract()
     return a list of selectors
    >>>for index, link in enumerate(links):
    ... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract()) ... print 'Link number %d points to url %s and image %s' % args
    Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg'] Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg'] Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg'] Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg'] Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
    enumerate() 函数一般用在 for 循环当中。
    普通的 for 循环
    >>>i = 0 >>> seq = ['one', 'two', 'three'] >>> for element in seq: ... print i, seq[i] ... i +=1 ... 0 one 1 two 2 three
    for 循环使用 enumerate
    >>>seq = ['one', 'two', 'three'] >>> for i, element in enumerate(seq): ... print i, seq[i] ... 0 one 1 two 2 three
    suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
    >>> divs = response.xpath('//div')
    note the dot prefixing the .//p XPath):
    >>> for p in divs.xpath('.//p'): # extracts all <p> inside ... print p.extract()
    Another common case would be to extract all direct <p> children:
    >>> for p in divs.xpath('p'): ... print p.extract()
    在程序中使用shell
    from scrapy.shell import inspect_response inspect_response(response, self)
    Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the crawling:
    xpath最外层最好用单引号!
    shell 本地html,方便 调试(但别取名为index.html)
    scrapy shell ./path/to/file.html ,即使在本目录,也必须要加./,不能直接 shell file.html scrapy shell ../other/path/to/file.html scrapy shell /absolute/path/to/file.html
  • 相关阅读:
    收音机 德生
    Ubuntu14.04+安卓系统4.3+JDK6编译源码
    springboot2.0+redis实现消息队列+redis做缓存+mysql
    万能命令
    分享个强大的抓包工具
    Vue之Mustache语法
    Vue之vbind基本使用
    Centos7.3环境下安装最新版的Python3.8.4
    Vue之vonce、vhtml、vtext、vpre、vcloak的基本使用
    Centos7.3安装最新版本git
  • 原文地址:https://www.cnblogs.com/elesos/p/7885474.html
Copyright © 2011-2022 走看看