zoukankan      html  css  js  c++  java
  • lxml的使用

    from urllib import request
    from lxml import etree
    # url = '''http://bangumi.tv/anime/browser?sort=rank'''
    # response = request.urlopen(url)
    # html = response.read()
    html = '''
     id="item_1728" class="item even clearit">
        <a href="/subject/1728" class="subjectCover cover ll">
                <span class="image">
                <img src="//lain.bgm.tv/pic/cover/s/71/37/1728_HLsCr.jpg" class="cover">
            </span>
            <span class="overlay"></span>
            </a>
        <div class="inner">
                                    <h3>
                    
                    
                        <a href="/subject/1728" class="l">浪客剑心 追忆篇</a> <small class="grey">るろうに剣心 -明治剣客浪漫譚- 追憶編</small>
                    </h3>
                    
            <span class="rank"><small>Rank </small>12</span>        
            <p class="info tip">
                             4话 /  1999年2月20日                    </p>
                    <p class="rateInfo">
                            <span class="sstars9 starsinfo"></span> <small class="fade">8.8</small> <span class="tip_j">(2165人评分)</span>
                        </p>
                    
                    
                    
                            
        </div>
    </li>
    '''
    
    html = etree.HTML(html)
    result = etree.tostring(html)
    print(result)
    li_all = html.xpath('//a')
    print(li_all)#[<Element a at 0x2ebe198>, <Element a at 0x2ebe170>]
    # li_all = html.xpath('//a/@href')['/subject/1728', '/subject/1728']
    # print(li_all)
    li_all = html.xpath('//a/@class')#['subjectCover cover ll', 'l']
    print(li_all)
    li_all = html.xpath('//a[@href="/subject/1728"]')#获取所有href等于这个的标签
    print(li_all)
    li_all = html.xpath('//div/a')#获取a标签下所有的子span标签
    print(li_all)
    li_all = html.xpath('//div//a')#获取a标签下所有的子孙span标签
    print(li_all)
    li_all = html.xpath('//div//a//@class')#获取a标签下所有的子孙span标签
    print(li_all)
    li_all = html.xpath('//div//p[last()]/span')#获取最后一个p元素的所有span标签
    print(li_all)
    li_all = html.xpath('//div//p[last()-1]')#获取倒数第二个个p元素的所有span标签
    print(li_all[0].text)
    li_all = html.xpath('string()')#过滤标签,返回所有文本
    print(li_all)
    li_all = html.xpath('//text()')#过滤标签,将每个文本存放于列表中
    print(li_all)
    li_all = html.xpath('//text()')
    print(li_all[0].getparent().tag)#根据文本返回它的标签名
    print(li_all[1].is_tail)
    print(li_all[1].is_tail)#判断是普通文本还是tail文本
  • 相关阅读:
    HDU 3008 DP
    XCode 7 高速切换代码窗体和文档窗体
    软工视频(37~46)-软件管理
    js 实现对ajax请求面向对象的封装
    sgu101Domino
    如何查看Eclipse的数字版的版本(转)
    Java的历史和大事记
    Eclipse使用前准备(转)
    启动 Eclipse 弹出“Failed to load the JNI shared library jvm.dll”错误的解决方法!
    如何快速配好java环境变量和查看电脑上安装JDK的版本位数
  • 原文地址:https://www.cnblogs.com/ldq1996/p/8269954.html
Copyright © 2011-2022 走看看