zoukankan      html  css  js  c++  java
  • scrapy选择器

    scrapy的Selector选择器

    选择器
    #!/usr/bin/env python
    # -*- coding:utf-8 -*-
    from scrapy.selector import Selector, HtmlXPathSelector
    from scrapy.http import HtmlResponse
    html = """<!DOCTYPE html>
    <html>
        <head lang="en">
            <meta charset="UTF-8">
            <title></title>
        </head>
        <body>
            <ul>
                <li class="item-"><a id='i1' href="link.html">first item</a></li>
                <li class="item-0"><a id='i2' href="llink.html">first item</a></li>
                <li class="item-1"><a href="llink2.html">second item<span>vv</span></a></li>
            </ul>
            <div><a href="llink2.html">second item</a></div>
        </body>
    </html>
    """
    response = HtmlResponse(url='http://example.com', body=html,encoding='utf-8')
    # hxs = HtmlXPathSelector(response)
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[2]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[@id]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[@id="i1"]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[@href="link.html"][@id="i1"]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[contains(@href, "link")]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[starts-with(@href, "link")]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[re:test(@id, "id+")]')
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[re:test(@id, "id+")]/text()').extract()
    # print(hxs)
    # hxs = Selector(response=response).xpath('//a[re:test(@id, "id+")]/@href').extract()
    # print(hxs)
    # hxs = Selector(response=response).xpath('/html/body/ul/li/a/@href').extract()
    # print(hxs)
    # hxs = Selector(response=response).xpath('//body/ul/li/a/@href').extract_first()
    # print(hxs)
     
    # ul_list = Selector(response=response).xpath('//body/ul/li')
    # for item in ul_list:
    #     v = item.xpath('./a/span')
    #     # 或
    #     # v = item.xpath('a/span')
    #     # 或
    #     # v = item.xpath('*/a/span')
    #     print(v)
    
  • 相关阅读:
    笔记-迎难而上之Java基础进阶4
    笔记-迎难而上之Java基础进阶3
    笔记-迎难而上之Java基础进阶1
    7天学完Java基础之7/7
    Java学习笔记(3)--- 内部类,基本数据类型
    C++ 基础语法 快速复习笔记(3)---重载函数,多态,虚函数
    C++ 基础语法 快速复习笔记---面对对象编程(2)
    C++ 基础语法 快速复习笔记(1)
    堆与栈(heap and stack)在c/c++的应用(概念)
    Python爬虫入门教程 5-100 27270图片爬取
  • 原文地址:https://www.cnblogs.com/wailaifeike/p/10204854.html
Copyright © 2011-2022 走看看