zoukankan      html  css  js  c++  java
  • 爬虫

    例题


    import
    lxml.html test_data = """ <div> <ul> <li class="item-0"><a href="link1.html" id="places_neighbours__row">9,596,960first item</a></li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-inactive"><a href="link3.html">third item</a></li> <li class="item-1"><a href="link4.html" id="places_neighbours__row">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> <li class="good-0"><a href="link5.html">fifth item</a></li> </ul> <book> <title lang="aaengbb">111111</title> <price id="places_neighbours__row">29.99</price> </book> <book> <title lang="zh">222222</title> <price>39.95</price> </book> <book> <title>33333</title> <price>40</price> </book> </div> <a> <book> <title>123</title> </book> </a> """ """ / 从根标签开始 必须具有严格的父子关系 // 从当前标签 后续节点含有即可选出 * 通配符,选择所有 //div/book[1]/title 选择div下第一个book标签的title元素 //div/book/title[@lang="zh"]选择title属性含有lang且内容是zh的title元素 //div/book/title //book/title //title //div//title 具有相同的结果,因为使用相对路径最终都指向title //book/title/@* 将title所有的属性值选择出来 //book/title/text() 将title的内容选择出来,使用内置text()函数 //a[@href="link1.html" and @id="places_neighbours__row"] //a[@href="link1.html" or @id="places_neighbours__row"] //div/book[last()]/title/text() 将最后一个book元素选出 //div/book[price > 39]/title 将book子标签price数值大于39的选择出来 //li[starts-with(@class,'item')] 将class属性前缀是item的li标签选出 //title[contains(@lang,'eng')] 将title属性lang含有eng关键字的标签选出 """ html = lxml.html.fromstring(test_data) #html_data = html.xpath('//div/book/title/text()') #html_data = html.xpath('//div/book[1]/title/text()') #html_data = html.xpath('//div/book/title[@lang="zh"]/text()') #html_data = html.xpath('//div/book/title/text()') # html_data = html.xpath('//book/title/text()') # html_data = html.xpath('//title/text()') # html_data = html.xpath('//div//title/text()') # html_data = html.xpath('//book/title/@*') # html_data = html.xpath('//a[@href="link1.html" and @id="places_neighbours__row"]/text()') #html_data = html.xpath('//a[@href="link2.html"]/text()') # html_data = html.xpath('//div/ul/li/a[@id]/text()') # html_data = html.xpath('//a[@href="link1.html" and @id="places_neighbours__row"]/@*') # html_data = html.xpath('//a[@href="link1.html" and @id="places_neighbours__row"]/@href') # html_data = html.xpath('//a[@href="link1.html" or @id="places_neighbours__row"]/text()') # html_data = html.xpath('//div/book[last()]/title/text()') #html_data = html.xpath('//div/book[price > 39]/title/text()') # html_data = html.xpath('//li[starts-with(@class,"item")]/a/text()') html_data = html.xpath('//title[contains(@lang,"eng")]/text()') for i in html_data: print(i)
  • 相关阅读:
    DOS命令行编译运行java
    mysql安装
    ICCV2021 | Vision Transformer中相对位置编码的反思与改进
    ICCV2021 | 医学影像等小数据集的非自然图像领域能否用transformer?
    ICCV2021 | TransFER:使用Transformer学习关系感知的面部表情表征
    2021视频监控中的多目标跟踪综述
    ML2021 | (腾讯)PatrickStar:通过基于块的内存管理实现预训练模型的并行训练
    ICCV2021 | SOTR:使用transformer分割物体
    ICCV2021 | PnPDETR:用Transformer进行高效的视觉分析
    使用Latex/Tex创建自己的简历。
  • 原文地址:https://www.cnblogs.com/chengxubo/p/10152415.html
Copyright © 2011-2022 走看看