zoukankan html css js c++ java

Scrapy选择器的用法

1.构造选择器：

>>> response = HtmlResponse(url='http://example.com', body=body)
>>> Selector(response=response).xpath('//span/text()').extract()
[u'good']

2.使用选择器（在response使用xpath或CSS查询）：

.xpath() 及 .css() 方法返回一个类 SelectorList 的实例, 它是一个新选择器的列表。

>>> response.xpath('//title/text()')
[<Selector (text) xpath=//title/text()>]
>>> response.css('title::text')
[<Selector (text) xpath=//title/text()>]

xpath中 //选取标签，/选择属性， CSS中用 :: 选取属性。

调用 extract() 来获取标签内容，使用extract_frist()来获取第一个元素内容。

>>> response.css('title::text').extract()
[u'Example website']

使用@或attr()来获取属性。

>>> response.xpath('//base/@href').extract()
[u'http://example.com/']

>>> response.css('base::attr(href)').extract()
[u'http://example.com/']

获取指定内容，如image。

>>> response.xpath('//a[contains(@href, "image")]/@href').extract()
[u'image1.html',
 u'image2.html',
 u'image3.html',
 u'image4.html',
 u'image5.html']

>>> response.css('a[href*=image]::attr(href)').extract()
[u'image1.html',
 u'image2.html',
 u'image3.html',
 u'image4.html',
 u'image5.html']

结合正则表达式。

>>> response.xpath('//a[contains(@href, "image")]/text()').re(r'Name:s*(.*)')
[u'My image 1',
 u'My image 2',
 u'My image 3',
 u'My image 4',
 u'My image 5']

查看全文

相关阅读:
第几天
 打印图形
 父类上的注解能被子类继承吗
 [LeetCode] 108. Convert Sorted Array to Binary Search Tree ☆(升序数组转换成一个平衡二叉树)
探究高可用服务端架构的优秀资料索引
 无序数组的中位数
 [LeetCode] 113. Path Sum II ☆☆☆(二叉树所有路径和等于给定的数)
[LeetCode] 112. Path Sum ☆(二叉树是否有一条路径的sum等于给定的数)
翻转单词
 [LeetCode] 110. Balanced Binary Tree ☆(二叉树是否平衡)

原文地址：https://www.cnblogs.com/weixuqin/p/8434958.html

热门文章
封装dialog弹框
 translate动画实例
 L1-009 N个数求和（20 分)
L3-019 代码排版（30 分)
书号验证
 猴子分香蕉
 哪天返回
 快速排序
 测试次数
 明码