zoukankan      html  css  js  c++  java
  • python利用lxml读写xml格式文件

    之前在转换数据集格式的时候需要将json转换到xml文件,用lxml包进行操作非常方便。

    1. 写xml文件

    a) 用etree和objectify

    from lxml import etree, objectify
    
    E = objectify.ElementMaker(annotate=False)
    anno_tree = E.annotation(
        E.folder('VOC2014_instance'),
        E.filename("test.jpg"),
        E.source(
            E.database('COCO'),
            E.annotation('COCO'),
            E.image('COCO'),
            E.url("http://test.jpg")
        ),
        E.size(
            E.width(800),
            E.height(600),
            E.depth(3)
        ),
        E.segmented(0),
    )
    
    etree.ElementTree(anno_tree).write("text.xml", pretty_print=True)
    
    输出的test.xml文件内容如下:
    
    VOC2014_instance/person test.jpg COCO COCO COCO http://test.jpg 800 600 3 0 ```

    如果需要在anno_tree的基础上加其他标签的话用append即可:

    E2 = objectify.ElementMaker(annotate=False)
    anno_tree2 = E2.object(
        E.name("person"),
        E.bndbox(
            E.xmin(100),
            E.ymin(200),
            E.xmax(300),
            E.ymax(400)
        ),
        E.difficult(0)
    )
    anno_tree.append(anno_tree2)
    

    上面的输出就变成了:

    <annotation>
      <folder>VOC2014_instance/person</folder>
      <filename>test.jpg</filename>
      <source>
        <database>COCO</database>
        <annotation>COCO</annotation>
        <image>COCO</image>
        <url>http://test.jpg</url>
      </source>
      <size>
        <width>800</width>
        <height>600</height>
        <depth>3</depth>
      </size>
      <segmented>0</segmented>
      <object>
        <name>person</name>
        <bndbox>
          <xmin>100</xmin>
          <ymin>200</ymin>
          <xmax>300</xmax>
          <ymax>400</ymax>
        </bndbox>
        <difficult>0</difficult>
      </object>
    </annotation>
    

    b) 用etree和SubElement

    annotation = etree.Element("annotation")
    etree.SubElement(annotation, "folder").text = "VOC2014_instance"
    etree.SubElement(annotation, "filename").text = "test.jpg"
    source = etree.SubElement(annotation, "source")
    etree.SubElement(source, "database").text = "COCO"
    etree.SubElement(source, "annotation").text = "COCO"
    etree.SubElement(source, "image").text = "COCO"
    etree.SubElement(source, "url").text = "http://test.jpg"
    size = etree.SubElement(annotation, "size")
    etree.SubElement(size, "width").text ='800'  # 必须用string
    etree.SubElement(size, "height").text = '600'
    etree.SubElement(size, "depth").text = '3'
    etree.SubElement(annotation, "segmented").text = '0'
    key_object = etree.SubElement(annotation, "object")
    etree.SubElement(key_object, "name").text = “person”
    bndbox = etree.SubElement(key_object, "bndbox")
    etree.SubElement(bndbox, "xmin").text = str(100)
    etree.SubElement(bndbox, "ymin").text = str(200)
    etree.SubElement(bndbox, "xmax").text = str(300)
    etree.SubElement(bndbox, "ymax").text = str(400)
    etree.SubElement(key_object, "difficult").text = '0'
    doc = etree.ElementTree(annotation)
    doc.write(open("test.xml", "w"), pretty_print=True)
    

    2. 读xml

    这里可以用xpath直接提取所需的元素的值。比如想要获取上面test.xml文件的x, y坐标:

    tree = etree.parse("test.xml")
    # get bbox
    for bbox in tree.xpath('//bndbox'):   # 获取bndbox元素的内容
        for corner in bbox.getchildren():  # 便利bndbox元素下的子元素
            print corner.text   # string类型
    

    参考

    1. http://lxml.de/tutorial.html
    2. https://stackoverflow.com/questions/12657043/parse-xml-with-lxml-extract-element-value
  • 相关阅读:
    【BP算法】
    【C++问题整理】
    【最大回文长度】
    【连通区域个数】
    Redis的复制(Master/Slave)、主从复制、读写分离 (下)
    Redis的复制(Master/Slave)、主从复制、读写分离
    Redis发布订阅
    Redis事务
    Redis持久化
    Redis配置文件
  • 原文地址:https://www.cnblogs.com/arkenstone/p/7338978.html
Copyright © 2011-2022 走看看