zoukankan      html  css  js  c++  java
  • python 第二周(第十天) 我的python成长记 一个月搞定python数据挖掘!(18) -mongodb

    1. 首先导入工具
    from scrapy.selector import Selector

    2. selectors的使用
    实例:response.selector.xpath('//span/text()').extract()

    (1)选择title标签中text的文本内容
    response.selector.xpath('//title/text()')
    提供两个更简单的方法
    response.xpath('//title/text()')
    response.css('title::text')
    例子:
    response.css('img').xpath('@src').extract()
    response.xpath('//div[@id="images"]/a/text()').extract_first()
    response.xpath('//div[@id="not-exists"]/text()').extract_first(default='not-found')
    (2)使用正则匹配的
    response.xpath('//a[contains(@href, "image")]/text()').re(r'Name:s*(.*)')
    response.xpath('//a[contains(@href, "image")]/text()').re_first(r'Name:s*(.*)')
    (3)Working with relative XPaths
    divs = response.xpath('//div')
    for p in divs.xpath('.//p'):
    print p.extract()
    for p in divs.xpath('p'):
    print p.extract()
    (4)
    (5)

    官方实例:
    >>> links = response.xpath('//a[contains(@href, "image")]')
    >>> links.extract()
    [u'<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>',
    u'<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>',
    u'<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>',
    u'<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>',
    u'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']

    >>> for index, link in enumerate(links):
    ... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
    ... print 'Link number %d points to url %s and image %s' % args

    Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
    Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
    Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg']
    Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg']
    Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
  • 相关阅读:
    arm,iptables: No chain/target/match by that name.
    Windows7-USB-DVD-tool提示不能拷贝文件的处理
    WPF实现WORD 2013墨迹批注功能
    windows下实现屏幕分享(C#)
    Owin WebAPI上传文件
    js 下不同浏览器,new Date转换结果时差
    jquery 动态增加的html元素,初始化设置在id或class上的事件无效
    WPF DataGrid模拟click实现效果
    基于Bootstrap的步骤引导html页面
    XWalkView+html 开发Android应用
  • 原文地址:https://www.cnblogs.com/yugengde/p/7277406.html
Copyright © 2011-2022 走看看