zoukankan      html  css  js  c++  java
  • Python—使用xml.sax解析xml文件

    什么是sax?

    SAX是一种基于事件驱动的API。

    利用SAX解析XML文档牵涉到两个部分:解析器和事件处理器。

    解析器负责读取XML文档,并向事件处理器发送事件,如元素开始跟元素结束事件;

    而事件处理器则负责对事件作出相应,对传递的XML数据进行处理。

    sax适于处理下面的问题:

    • 1、对大型文件进行处理;
    • 2、只需要文件的部分内容,或者只需从文件中得到特定信息;
    • 3、想建立自己的对象模型的时候。

    在python中使用sax方式处理xml要先引入xml.sax中的parse函数,还有xml.sax.handler中的ContentHandler。

    movies.xml:需要解析的xml文件,上一篇博客中使用dom解析的一样

    <collection shelf="New Arrivals">
    <movie title="Enemy Behind">
       <type>War, Thriller</type>
       <format>DVD</format>
       <year>2003</year>
       <rating>PG</rating>
       <stars>10</stars>
       <description>Talk about a US-Japan war</description>
    </movie>
    <movie title="Transformers">
       <type>Anime, Science Fiction</type>
       <format>DVD</format>
       <year>1989</year>
       <rating>R</rating>
       <stars>8</stars>
       <description>A schientific fiction</description>
    </movie>
    <movie title="Trigun">
       <type>Anime, Action</type>
       <format>DVD</format>
       <episodes>4</episodes>
       <rating>PG</rating>
       <stars>10</stars>
       <description>Vash the Stampede!</description>
    </movie>
    <movie title="Ishtar">
       <type>Comedy</type>
       <format>VHS</format>
       <rating>PG</rating>
       <stars>2</stars>
       <description>Viewable boredom</description>
    </movie>
    </collection>

    xmltest.py:解析代码如下

    # -*- coding:UTF-8 -*-
    
    '''
    Created on 2015年9月10日
    
    @author: xiaowenhui
    '''
    
    import xml.sax
    
    #第二种方法,sax解析 
    class MovieHandler(xml.sax.ContentHandler):  #继承于xml.sax.ContentHandler类
            
        def __init__(self):
            self.CurrentData = ""
            self.type = ""
            self.format = ""
            self.year = ""
            self.episodes = ""
            self.rating = ""
            self.stars = ""
            self.description = ""
            self.title = ""
    
        # 元素开始事件处理
        def startElement(self, tag, attributes):
            self.CurrentData = tag
            if tag == "movie":
                print "*****Movie*****"
                self.title = attributes["title"]
                print "Title:", self.title
    
        # 内容事件处理 
        def characters(self, content):
            if self.CurrentData == "type":
                self.type = content  
            elif self.CurrentData == "format":
                self.format = content
            elif self.CurrentData == "year":
                self.year = content
            elif self.CurrentData == "episodes":
                self.episodes = content
            elif self.CurrentData == "rating":
                self.rating = content
            elif self.CurrentData == "stars":
                self.stars = content
            elif self.CurrentData == "description":
                self.description = content
                
        # 元素结束事件处理
        def endElement(self, tag):
            if self.CurrentData == "type":
                print "Type:", self.type
            elif self.CurrentData == "format":
                print "Format:", self.format
            elif self.CurrentData == "year":
                print "Year:", self.year
            elif self.CurrentData == "episodes":
                print "Episodes:", self.episodes
            elif self.CurrentData == "rating":
                print "Rating:", self.rating
            elif self.CurrentData == "stars":
                print "Stars:", self.stars
            elif self.CurrentData == "description":
                print "Description:", self.description
                               
      
    # 创建一个 XMLReader
    parser = xml.sax.make_parser()
    # turn off namepsaces
    parser.setFeature(xml.sax.handler.feature_namespaces, 0)
    
    # 重写 ContextHandler
    Handler = MovieHandler()
    parser.setContentHandler( Handler )
       
    parser.parse("movies.xml")

    输出结果如下:

    疑问:不知道为什么会多输出一个description,可能是sax解析的时候哪里写的不对,现在还没找到原因,我把

     elif self.CurrentData == "description":
     print "Description:", self.description

    改成
     elif self.CurrentData == "description":
     print  self.description
    后就没有输出“description”,只输出了self.description这个参数

    *****Movie*****
    Title: Enemy Behind
    Type: War, Thriller
    Format: DVD
    Year: 2003
    Rating: PG
    Stars: 10
    description: Talk about a US-Japan war
    description: 
    
    *****Movie*****
    Title: Transformers
    Type: Anime, Science Fiction
    Format: DVD
    Year: 1989
    Rating: R
    Stars: 8
    description: A schientific fiction
    description: 
    
    *****Movie*****
    Title: Trigun
    Type: Anime, Action
    Format: DVD
    Episodes: 4
    Rating: PG
    Stars: 10
    description: Vash the Stampede!
    description: 
    
    *****Movie*****
    Title: Ishtar
    Type: Comedy
    Format: VHS
    Rating: PG
    Stars: 2
    description: Viewable boredom
    description: 
    
    description: 
  • 相关阅读:
    python 小兵(4)之文件操作 小问题
    python 小兵(4)之文件操作
    排序
    Java的数据结构
    基本数据操作
    部署tomcat到Linux
    找工作的一些知识积累
    BootStrap
    操作系统
    做Global Admin
  • 原文地址:https://www.cnblogs.com/xiaowenhui/p/4807924.html
Copyright © 2011-2022 走看看