scrapy 框架
response的解析
>>> response.css('title::text').extract() ['Quotes to Scrape']
There are two things to note here:
(1)one is that we’ve added::textto the CSS query, to mean we want to select only the text elements directly inside<title>element. If we don’t specify::text, we’d get the full title element, including its tags:
(2)the other thing is that the result of calling.extract()is a list, because we’re dealing with an instance ofSelectorList. When you know you just want the first result, as in this case, you can do:
When you know you just want the first result, as in this case, you can do:
>>> response.css('title::text').extract_first()
'Quotes to Scrape'
Besides the extract() and extract_first() methods, you can also use the re() method to extract using regular expressions:
>>> response.css('title::text').re(r'Quotes.*') ['Quotes to Scrape'] >>> response.css('title::text').re(r'Qw+') ['Quotes'] >>> response.css('title::text').re(r'(w+) to (w+)') ['Quotes', 'Scrape']