xml指的是可扩展标记语言,大部分情况是用来传输和储存数据,是一种允许用户对自己的标记语言进行定义的源语言,不同编程语言都对
xml提供了编程接口,python
编程接口有DOM和SAX还有ElementTree
在这些接口操作之前,我们要定义一个xml文件,people.xml如下所示:
<?xml version="1.0" encoding="UTF-8" ?> <persons> <person> <name>李四</name> <age>23</age> <sex>男</sex> </person> <person> <name>王五</name> <age>23</age> <sex>女</sex> </person> <person> <name>赵丽</name> <age>13</age> <sex>女</sex> </person> </persons>
1.DOM:DOM的全称为Document Object Model,如它的名字所示,它是是直接读取全部xml文件数据解析成树,在通过操作树来操作xml,操
作代码如下:
from xml.dom.minidom import parse from person import peson domTree=parse("person.xml") print(domTree) root=domTree.documentElement print(root) ps=root.getElementsByYagName("person") for p in ps: print(p.getElementsByYagName("name")[0].childNodes[0].data) print(p.getElementsByYagName("age")[0].childNodes[0].data) print(p.getElementsByYagName("sex")[0].childNodes[0].data)
2.SAX:sax解析要继承ContrntHandle类用到的方法有:
startDocument(self):启动文档
endDocument(self):结束文档
startElement(self,name,attrs):调用xml
endElement(self,name):结束xml
characters(self,content):读取文本元素
知道如上方法我们来操作SAX:
from xml.sax import parse as saxparse from xml.sax import ContentHandler from person import peson people=[] class person: def __init__(self): self.person=None self.tag=None def starElement(self,name,attrs): self.tag=name if name=='person': self.person=person() def endElement(self,name): if name=='person': global people; people.append(self.person) self.person=None self.tag=None def characters(self,content): if "name"==self.tag: self.person.name=content if "age"==self.tag: self.person.name=int(content) if "sex"==self.tag: self.person.sex=content saxparse=xml.sax.make_parser() saxparse.setFeature(xml.sax.handler.feature) saxparse.setContentHandler(personHander()) saxparse.parse(person.xml) for p in people: print(p)