python 第二周（第九天）我的python成长记一个月搞定python数据挖掘！(16) -scrapy框架

scrapy 框架

response的解析

>>> response.css('title::text').extract()
['Quotes to Scrape']

There are two things to note here:
　　(1)one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside <title> element. If we don’t specify ::text, we’d get the full title element, including its tags:　　
　　(2)the other thing is that the result of calling .extract() is a list, because we’re dealing with an instance of SelectorList. When you know you just want the first result, as in this case, you can do:
When you know you just want the first result, as in this case, you can do:

>>> response.css('title::text').extract_first()
'Quotes to Scrape'

Besides the extract() and extract_first() methods, you can also use the re() method to extract using regular expressions:

>>> response.css('title::text').re(r'Quotes.*')
['Quotes to Scrape']
>>> response.css('title::text').re(r'Qw+')
['Quotes']
>>> response.css('title::text').re(r'(w+) to (w+)')
['Quotes', 'Scrape']

查看全文

相关阅读:
刷题总结——跳蚤（poj1091容斥+分解质因数）
刷题总结——分糖（ssoj 容斥原理+逆元+快速幂+组合数求插板）
刷题总结——旅馆（bzoj1593线段树）
刷题总结——树的同构（bzoj4337 树上hash）
刷题总结——骑士的旅行（bzoj4336 树链剖分套权值线段树）
刷题总结——松鼠的新家（bzoj3631）
mysql备份与恢复
 nginx添加用户验证（访问服务器是的用户名密码）
df命令
 org.mongodb.morphia.query.QueryException: sorting is not allowed for updates.

原文地址：https://www.cnblogs.com/yugengde/p/7270696.html

python 第二周（第九天） 我的python成长记 一个月搞定python数据挖掘！(16) -scrapy框架

python 第二周（第九天）我的python成长记一个月搞定python数据挖掘！(16) -scrapy框架