xml实例:
版本一:
<?xml version="1.0" encoding="UTF-8"?><country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">晴</wulumuqi></xinjiang></provinces></country>
没有空格,换行,的版本
python操作操作实例:
from lxml import etree class r_xpath_xml(object): def __init__(self): self.xmetrpa=etree.parse('info.xml') #读取xml数据 pass def xpxm(self): xpxlm=self.xmetrpa print etree.tostring(xpxlm) #打印xml数据 root=xpxlm.getroot() #获得该树的树根 print root.tag,' ', #打印根标签名 print root.items() #获得标签属性名称和属性值 for a in root: ##遍历根下一集级标签 print a.tag,a.items(),a.text,' 被打印的类型为: ',type(a) #打印标签名称,标签属性,标签数据 for b in a: print b.tag,b.items(),b.text#,b for c in b: print c.tag,c.items(),c.text#,c for d in c: print d.tag,d.items(),d.test,d print xpxlm.xpath('//node()')#.items()#.tag print '=====================================================================================================' xa=xpxlm.xpath('//heilongjiang/*') print xa for xb in xa: print xb.tag,xb.items(),xb.text xc=xpxlm.xpath('//xinjiang/*') print xc for xd in xc: print xd.tag,xd.items(),xd.text if __name__ == '__main__': xpx=r_xpath_xml() xpx.xpxm()
应用for循环遍历标签层次结构,tag获取标签名,items()通过字典函数获取[('属性名' , '属性值')],text获取标签对之间的数据。tag,items(),text针对的类型为:<type 'lxml.etree._Element'>
打印结果:
<country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">晴</wulumuqi></xinjiang></provinces></country> country [('name', 'chain')] provinces [] None 被打印的类型为: <type 'lxml.etree._Element'> heilongjiang [('name', 'citys')] None haerbin [] None daqing [] None guangdong [('name', 'citys')] None guangzhou [] None shenzhen [] None huhai [] None taiwan [('name', 'citys')] None taibei [] None gaoxiong [] None xinjiang [('name', 'citys')] None wulumuqi [('waith', 'tianqi')] 晴 [<Element country at 0x2d47b20>, <Element provinces at 0x2d47990>, <Element heilongjiang at 0x2d479b8>, <Element haerbin at 0x2d47558>, <Element daqing at 0x2d47328>, <Element guangdong at 0x2d47300>, <Element guangzhou at 0x2d476e8>, <Element shenzhen at 0x2d47530>, <Element huhai at 0x2d472d8>, <Element taiwan at 0x2d47260>, <Element taibei at 0x2d47238>, <Element gaoxiong at 0x2d47080>, <Element xinjiang at 0x2d47710>, <Element wulumuqi at 0x2d47968>, u'u6674'] ===================================================================================================== [<Element haerbin at 0x2d479b8>, <Element daqing at 0x2d47148>] haerbin [] None daqing [] None [<Element wulumuqi at 0x2d47968>] 类型为: <type 'list'> wulumuqi [('waith', 'tianqi')] 晴
xml实例:
版本二:
<?xml version="1.0" encoding="UTF-8"?> <country name="chain"> <provinces> <city:table xmlns:city="http://www.w3school.com.cn/furniture"> <heilongjiang name="citys"><city:haerbin/><city:daqing/></heilongjiang> <guangdong name="citys"><city:guangzhou/><city:shenzhen/><city:zhuhai/></guangdong> <taiwan name="citys"><city:taibei/><city:gaoxiong/></taiwan> <xinjiang name="citys"><city:wulumuqi>晴</city:wulumuqi></xinjiang> </city:table> </provinces> </country>
实例:
print xpxlm.xpath('//node()')
打印结果:
空格回车字符,命名空间。
[<Element country at 0x2e79b20>, ' ', <Element provinces at 0x2e79990>, ' ', <Element {http://www.w3school.com.cn/furniture}table at 0x2e79710>, ' ', <Element heilongjiang at 0x2e799b8>, <Element {http://www.w3school.com.cn/furniture}haerbin at 0x2e79328>, <Element {http://www.w3school.com.cn/furniture}daqing at 0x2e79968>, ' ', <Element guangdong at 0x2e79530>, <Element {http://www.w3school.com.cn/furniture}guangzhou at 0x2e79300>, <Element {http://www.w3school.com.cn/furniture}shenzhen at 0x2e792d8>, <Element {http://www.w3school.com.cn/furniture}zhuhai at 0x2e79260>, ' ', <Element taiwan at 0x2e79238>, <Element {http://www.w3school.com.cn/furniture}taibei at 0x2e79080>, <Element {http://www.w3school.com.cn/furniture}gaoxiong at 0x2e79058>, ' ', <Element xinjiang at 0x2e796e8>, <Element {http://www.w3school.com.cn/furniture}wulumuqi at 0x2e79558>, u'u6674', ' ', ' ', ' ']
去掉空格:
xp=xpxlm.xpath('//node()') print xp, #.items()#.tag for i in xp: if '' in i or ' ' in i: continue else: print i.tag
通过判断去除空格换行符号
输出结果:
provinces
{city}table
heilongjiang
{city}haerbin
{city}daqing
guangdong
{city}guangzhou
{city}shenzhen
{city}zhuhai
taiwan
{city}taibei
{city}gaoxiong
xinjiang
{city}wulumuqi