python中的lxml模块

zoukankan html css js c++ java

python中的lxml模块

Python中自带了XML的模块，但是性能不太好，相比之下，LXML增加了很多实用的功能。

lxml中主要有两部分，

1) etree，主要可以用来解析XML字符串，

　　内部有两个对象，etree._ElementTree和etree._Element

　etree.Element对象中包含的属性和方法：

属性：1)tag，返回该节点的名称：

　　　　　　print 'root.tag' 输出tag

　　　2)text，设置该节点的文本：

　　　　　　root.text = 'hello world' 输出<root>hello world</root>

　　　3)tail，在标签后边追加文本：

　　　　　　root.tail = 'hym' 输出<root>hello world</root>hym

方法：1)Element(string)，创建一个Element对象：

　　　　　　root = etree.Element('root') 返回一个XML的节点，名称root

　　　　　　root = etree.Element('root', interesting='totally') 返回一个root节点，属性interesting = 'totally'

　　　2) set(name,value)，为已有的节点，添加属性，

　　　　　　root.set('hello', 'huhu') 增加一个属性hello = 'huhu'

　　 3) get(string)，返回属性值，

　　　　　　root.get('intersting') 返回‘totally’

　　 4) keys()，返回所有的属性名，

　　　　　　root.keys() 返回interesting，hello

　　 5) items()，返回字典，其中包含所有的属性，及其 value

　　　　　　for name,value in sorted(root.items()) 返回两对属性

　　 6) 为该节点，添加子节点，

　　　　　　child1 = etree.SubElement(root, 'child')

　　 7) 为该节点，删除子节点，

　　　　　　root.remove(child1)

　　 8) getparent()，拿到父节点

　　　　　　child1.getparent().tag 返回root

etree允许，节点内部的子节点，认为是一个list，

　　print "len(root)" 返回root节点及其子节点的个数；

　　root.index(child2) 返回child2的索引值

　　child = root[0] 返回child1，允许索引访问

　　for child in root: 允许遍历

　　　　...

　　root.insert(0, etree.Element('child0')) 允许插入

　　root.append(etree.Element('child4')) 允许append

etree._Element对象是一颗xml的树，内部包含很多element的对象，

　　1)root.getroottree()，返回一个节点对应的树，root表示当前节点的tag，返回的是Tree类型的对象

　　2)getroot()，返回根节点，返回的是Element类型的对象

　　3)etree.ELementTree(root)，从一个节点构建一颗tree，该节点，也就是根节点，

etree，Element和Tree类型的对象，都支持xpath的方法：

　　foo.xpath('//root')[0].tag

2) html，主要用来解析html，

　　etree.html(HTML)来解析html，并得到Element对象，也可以调用xpath来分析xml

xpath，可以实现节点和属性的快速查找：

　　xpath('//div[@attr = value]/text()') 返回该div节点，满足attr属性要求的，节点的文本

　　xpath('//div[@attr]/@attr')　　返回该div节点，含有attr属性，的值

　　

　　

example：

import xml.etree.ElementTree as ET

<bookstore author="frank">

　　<book id="1">

　　　　<name>python</name>

　　　　<price>12.33</price>

　　<book id = "2">

　　　　......

tree = ET.ElementTree(file='./../text.xml')

root = tree.getroot();

for child in root:

　　print(child.tag, child.attrib, child.text)

查看全文

相关阅读:
JMeter的JavaRequest探究
 记一次生产压测遇到的"坑"
JMeter之If Controller深究二
 JMeter之SteppingShape
那些年拿过的shell之adminer
Spring Boot Actuator H2 RCE复现
 使用sqlmap结合dnslog快速注入
 一次稍显曲折的爆破经历
 无回显、不出网命令执行测试方式
 实战绕过某waf后缀检测内容检测

原文地址：https://www.cnblogs.com/-9-8/p/8251466.html