zoukankan      html  css  js  c++  java
  • Python—使用xm.dom解析xml文件

    什么是DOM?

    文件对象模型(Document Object Model,简称DOM),是W3C组织推荐的处理可扩展置标语言的标准编程接口。

    一个 DOM 的解析器在解析一个 XML 文档时,一次性读取整个文档,把文档中所有元素保存在内存中的一个树结构里,之后你可以利用DOM 提供的不同的函数来读取或修改文档的内容和结构,也可以把修改过的内容写入xml文件。

    优点:操作简单,容易理解

    缺点:因DOM需要将XML数据映射到内存中的树,一是比较慢,二是比较耗内存

    movies.xml:需要解析的xml文件如下:

    <collection shelf="New Arrivals">
    <movie title="Enemy Behind">
       <type>War, Thriller</type>
       <format>DVD</format>
       <year>2003</year>
       <rating>PG</rating>
       <stars>10</stars>
       <description>Talk about a US-Japan war</description>
    </movie>
    <movie title="Transformers">
       <type>Anime, Science Fiction</type>
       <format>DVD</format>
       <year>1989</year>
       <rating>R</rating>
       <stars>8</stars>
       <description>A schientific fiction</description>
    </movie>
    <movie title="Trigun">
       <type>Anime, Action</type>
       <format>DVD</format>
       <episodes>4</episodes>
       <rating>PG</rating>
       <stars>10</stars>
       <description>Vash the Stampede!</description>
    </movie>
    <movie title="Ishtar">
       <type>Comedy</type>
       <format>VHS</format>
       <rating>PG</rating>
       <stars>2</stars>
       <description>Viewable boredom</description>
    </movie>
    </collection>

    xmltest.py:解析movies.xml文件的python代码如下:

    # -*- coding:UTF-8 -*-
    
    '''
    Created on 2015年9月10日
    
    @author: xiaowenhui
    '''
    
    from xml.dom.minidom import parse
    import xml.dom.minidom
    
    
    #第一种方法,DOM解析
    
    #使用minidom解析器打开xml文档
    DOMTree  = xml.dom.minidom.parse("movies.xml")
    collection = DOMTree.documentElement
    
    #在集合中获取所有电影
    movies = collection.getElementsByTagName("movie")
    
    #打印每部电影的详细信息
    dict_movies = {}
    
    for movie in movies:
        dict_movie = {}
        title = ""
        print "*****Movie*****"
        if movie.hasAttribute("title"): #具有属性
            print "Title:%s" % movie.getAttribute("title") #获取属性值
            title = movie.getAttribute("title")
               
        try:
            type = movie.getElementsByTagName("type")[0] 
            print "Type :%s" % type.childNodes[0].data
            dict_movie["type"] = type.childNodes[0].data
        
            format = movie.getElementsByTagName("format")[0] #获取该标签下的第一个子节点
            print "format:%s" % format.childNodes[0].data
            dict_movie["format"] = format.childNodes[0].data
        
            try:
                year = movie.getElementsByTagName("year")[0]
                print "year :%s" % year.childNodes[0].data  
                dict_movie["year"] = year.childNodes[0].data 
            except:
                pass
            
            try:
                episodes = movie.getElementsByTagName("episodes")[0]
                print "episodes:%s" % episodes.childNodes[0].data
                dict_movie["episodes"] = episodes.childNodes[0].data
            except:
                pass
    
            rating = movie.getElementsByTagName('rating')[0]
            print "Rating: %s" % rating.childNodes[0].data
            dict_movie["rating"] = rating.childNodes[0].data
        
            stars = movie.getElementsByTagName('stars')[0]
            print "stars: %s" % stars.childNodes[0].data
            dict_movie["stars"] = stars.childNodes[0].data
        
            description = movie.getElementsByTagName('description')[0]
            print "Description: %s" % description.childNodes[0].data
            dict_movie["description"] = description.childNodes[0].data
        except:
            print "error:" + title  + "
    "
            continue   
        
        dict_movies[title] = dict_movie
    
    print dict_movies
     

     解析后的输出结果如下:

    *****Movie*****
    Title:Enemy Behind
    Type :War, Thriller
    format:DVD
    year :2003
    Rating: PG
    stars: 10
    Description: Talk about a US-Japan war
    *****Movie*****
    Title:Transformers
    Type :Anime, Science Fiction
    format:DVD
    year :1989
    Rating: R
    stars: 8
    Description: A schientific fiction
    *****Movie*****
    Title:Trigun
    Type :Anime, Action
    format:DVD
    episodes:4
    Rating: PG
    stars: 10
    Description: Vash the Stampede!
    *****Movie*****
    Title:Ishtar
    Type :Comedy
    format:VHS
    Rating: PG
    stars: 2
    Description: Viewable boredom
    {u'Transformers': {'rating': u'R', 'description': u'A schientific fiction', 'format': u'DVD', 'stars': u'8', 'year': u'1989', 'type': u'Anime, Science Fiction'}, u'Ishtar': {'rating': u'PG', 'type': u'Comedy', 'description': u'Viewable boredom', 'stars': u'2', 'format': u'VHS'}, u'Enemy Behind': {'rating': u'PG', 'description': u'Talk about a US-Japan war', 'format': u'DVD', 'stars': u'10', 'year': u'2003', 'type': u'War, Thriller'}, u'Trigun': {'rating': u'PG', 'description': u'Vash the Stampede!', 'format': u'DVD', 'episodes': u'4', 'stars': u'10', 'type': u'Anime, Action'}}
  • 相关阅读:
    tomcat日志警告WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting property 'debug' to '0' did not find a matching property.
    针对数据泵导出 (expdp) 和导入 (impdp)工具性能降低问题的检查表 (文档 ID 1549185.1)
    DATAPUMP PERFORMANCE EXPDP IS VERY SLOW 10.2.0.4 TO 11.2.0.2
    oracle已知会导致错误结果的bug列表(Bug Issues Known to cause Wrong Results)
    如何进行oracle capability i/o(压力测试数据库服务器i/o性能)
    RMAN备份与恢复之删除过期备份
    DBA常用SQL之表空间与数据文件
    DBA常用SQL之会话与等待事件
    Oracle监控代理安装ITM(IBM Tivoli Monitoring)
    linux 命令之系统活动报告sar
  • 原文地址:https://www.cnblogs.com/xiaowenhui/p/4807814.html
Copyright © 2011-2022 走看看