zoukankan      html  css  js  c++  java
  • Python通过lxml库遍历xml通过xpath查询(标签,属性名称,属性值,标签对属性)

    xml实例:

    版本一:

    <?xml version="1.0" encoding="UTF-8"?><country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi"></wulumuqi></xinjiang></provinces></country>

    没有空格,换行,的版本

    python操作操作实例:

    from lxml import etree
    class r_xpath_xml(object):
        def __init__(self):
            self.xmetrpa=etree.parse('info.xml') #读取xml数据
            pass
        def xpxm(self):
            xpxlm=self.xmetrpa
            print etree.tostring(xpxlm) #打印xml数据
            root=xpxlm.getroot() #获得该树的树根
            print root.tag,' ',  #打印根标签名
            print root.items() #获得标签属性名称和属性值
            for a in root:  ##遍历根下一集级标签
                print a.tag,a.items(),a.text,' 被打印的类型为: ',type(a)  #打印标签名称,标签属性,标签数据
            for b in a:
                print b.tag,b.items(),b.text#,b
                for c in b:
                    print c.tag,c.items(),c.text#,c
            for d in c:
                print d.tag,d.items(),d.test,d
            print xpxlm.xpath('//node()')#.items()#.tag
            print '====================================================================================================='
            xa=xpxlm.xpath('//heilongjiang/*')
            print xa
            for xb in xa:
                print xb.tag,xb.items(),xb.text
            xc=xpxlm.xpath('//xinjiang/*')
            print xc
            for xd in xc:
                print xd.tag,xd.items(),xd.text
    if __name__ == '__main__':
        xpx=r_xpath_xml()
        xpx.xpxm()
    应用for循环遍历标签层次结构,tag获取标签名,items()通过字典函数获取[('属性名' , '属性值')],text获取标签对之间的数据。tag,items(),text针对的类型为:<type 'lxml.etree._Element'>
    打印结果:
    <country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">&#26228;</wulumuqi></xinjiang></provinces></country>
    country   [('name', 'chain')]
    provinces [] None  被打印的类型为:  <type 'lxml.etree._Element'>
    heilongjiang [('name', 'citys')] None
    haerbin [] None
    daqing [] None
    guangdong [('name', 'citys')] None
    guangzhou [] None
    shenzhen [] None
    huhai [] None
    taiwan [('name', 'citys')] None
    taibei [] None
    gaoxiong [] None
    xinjiang [('name', 'citys')] None
    wulumuqi [('waith', 'tianqi')] 晴
    [<Element country at 0x2d47b20>, <Element provinces at 0x2d47990>, <Element heilongjiang at 0x2d479b8>, <Element haerbin at 0x2d47558>, <Element daqing at 0x2d47328>, <Element guangdong at 0x2d47300>, <Element guangzhou at 0x2d476e8>, <Element shenzhen at 0x2d47530>, <Element huhai at 0x2d472d8>, <Element taiwan at 0x2d47260>, <Element taibei at 0x2d47238>, <Element gaoxiong at 0x2d47080>, <Element xinjiang at 0x2d47710>, <Element wulumuqi at 0x2d47968>, u'u6674']
    =====================================================================================================
    [<Element haerbin at 0x2d479b8>, <Element daqing at 0x2d47148>]
    haerbin [] None
    daqing [] None
    [<Element wulumuqi at 0x2d47968>] 类型为: <type 'list'>
    wulumuqi [('waith', 'tianqi')] 晴

    xml实例:

    版本二:

    <?xml version="1.0" encoding="UTF-8"?>
    <country name="chain">
        <provinces>
            <city:table xmlns:city="http://www.w3school.com.cn/furniture">
            <heilongjiang name="citys"><city:haerbin/><city:daqing/></heilongjiang>
            <guangdong name="citys"><city:guangzhou/><city:shenzhen/><city:zhuhai/></guangdong>
            <taiwan name="citys"><city:taibei/><city:gaoxiong/></taiwan>
            <xinjiang name="citys"><city:wulumuqi></city:wulumuqi></xinjiang>
            </city:table>    
        </provinces>
    </country>

    实例:
    print xpxlm.xpath('//node()')

    打印结果:
    空格回车字符,命名空间。
    [<Element country at 0x2e79b20>, '
        ', <Element provinces at 0x2e79990>, '
            ', <Element {http://www.w3school.com.cn/furniture}table at 0x2e79710>, '
            ', <Element heilongjiang at 0x2e799b8>, <Element {http://www.w3school.com.cn/furniture}haerbin at 0x2e79328>, <Element {http://www.w3school.com.cn/furniture}daqing at 0x2e79968>, '
            ', <Element guangdong at 0x2e79530>, <Element {http://www.w3school.com.cn/furniture}guangzhou at 0x2e79300>, <Element {http://www.w3school.com.cn/furniture}shenzhen at 0x2e792d8>, <Element {http://www.w3school.com.cn/furniture}zhuhai at 0x2e79260>, '
            ', <Element taiwan at 0x2e79238>, <Element {http://www.w3school.com.cn/furniture}taibei at 0x2e79080>, <Element {http://www.w3school.com.cn/furniture}gaoxiong at 0x2e79058>, '
            ', <Element xinjiang at 0x2e796e8>, <Element {http://www.w3school.com.cn/furniture}wulumuqi at 0x2e79558>, u'u6674', '
            ', '    
        ', '
    ']

    去掉空格:

            xp=xpxlm.xpath('//node()')
            print xp,           #.items()#.tag
            for i in xp:
                if '' in i or '
    ' in i:
                    continue
                else: 
                    print i.tag

    通过判断去除空格换行符号

    输出结果:

    provinces
    {city}table
    heilongjiang
    {city}haerbin
    {city}daqing
    guangdong
    {city}guangzhou
    {city}shenzhen
    {city}zhuhai
    taiwan
    {city}taibei
    {city}gaoxiong
    xinjiang
    {city}wulumuqi




  • 相关阅读:
    mysql视图产生派生表无法优化案例
    根据.frm .ibd文件恢复表
    binlog内容时间乱序问题排查
    mysql官方的测试数据库employees超30万的数据,安装方法介绍
    数据库大量Waiting for table flush 状态SQL问题排查
    mysql搭建从库并配置ssl
    MySQL lOAD DATA详解
    redis eval
    aws-rds for mysql 5.7.34时间点恢复数据
    MySQL 如何处理监听连接的
  • 原文地址:https://www.cnblogs.com/liuliu-word/p/7498019.html
Copyright © 2011-2022 走看看