zoukankan      html  css  js  c++  java
  • python sax解析xml

    #books.xml
    <
    catalog> <book isbn="0-596-00128-2"> <title>Python &amp; XML</title> <title>Python &amp; HTML</title> <date>December 2001</date> <author>Jones, Drake</author> </book> <book isbn="0-596-15810-6"> <title>Programming Python, 4th Edition</title> <date>October 2010</date> <author>Lutz</author> </book> <book isbn="0-596-15806-8"> <title>Learning Python, 4th Edition</title> <date>September 2009</date> <author>Lutz</author> </book> <book isbn="0-596-15808-4"> <title>Python Pocket Reference, 4th Edition</title> <date>October 2009</date> <author>Lutz</author> </book> <book isbn="0-596-00797-3"> <title>Python Cookbook, 2nd Edition</title> <date>March 2005</date> <author>Martelli, Ravenscroft, Ascher</author> </book> <book isbn="0-596-10046-9"> <title>Python in a Nutshell, 2nd Edition</title> <date>July 2006</date> <author>Martelli</author> </book> <!-- plus many more Python books that should appear here --> </catalog>
    #conding:utf-8
    # -*- coding:utf-8 -*-
    __author__ = 'hdfs'
    '''
    总的来说 sax解析xml 进行3个阶段 sax是线性解析对于大的xml会很有效率
    '''
    import xml.sax,xml.sax.handler,pprint
    class BookHandler(xml.sax.handler.ContentHandler):
        def __init__(self):
            self.inTitle=False
            self.mapping={}
    
        def startElement(self, name, attrs):
            #book标签开始
            if name=="book":
                self.buffer=""
                self.isbn=attrs["isbn"]
            #title标签开始
            elif name=="title":
                self.inTitle=True
    
        def characters(self,data):
            #如果真的进入buffer 关联多个子节点的数据
            if self.inTitle:
                self.buffer+=data
        #结束一个元素的遍历
        def endElement(self,name):
            if name=="title":
                self.inTitle=False
                self.mapping[self.isbn]=self.buffer
    
    parser=xml.sax.make_parser()
    handler=BookHandler()
    parser.setContentHandler(handler)
    parser.parse('books.xml')
    pprint.pprint(handler.mapping)

    result:

    {u'0-596-00128-2': u'Python & XMLPython & HTML',
     u'0-596-00797-3': u'Python Cookbook, 2nd Edition',
     u'0-596-10046-9': u'Python in a Nutshell, 2nd Edition',
     u'0-596-15806-8': u'Learning Python, 4th Edition',
     u'0-596-15808-4': u'Python Pocket Reference, 4th Edition',
     u'0-596-15810-6': u'Programming Python, 4th Edition'}
  • 相关阅读:
    SVN安装
    MS SQL Server 查询元数据
    MS SQL Server 时间函数
    Codeforces 666E Forensic Examination (后缀自动机 + 线段树合并)
    2020牛客暑期多校训练营(第八场) H Hard String Problem
    2020牛客暑期多校训练营(第四场) B Count New String
    2020牛客暑期多校训练营(第四场) A Ancient Distance
    2020杭电多校第三场 1007 Tokitsukaze and Rescue
    codeforces 1037H
    2020牛客暑期多校训练营(第三场) H Sort the Strings Revision
  • 原文地址:https://www.cnblogs.com/similarface/p/5135161.html
Copyright © 2011-2022 走看看