zoukankan      html  css  js  c++  java
  • python 解析xml文件

    https://www.cnblogs.com/handsome1013/p/10058838.html
    ET.Parser 用法
    https://www.cnblogs.com/yezuhui/p/6853323.html

    https://blog.csdn.net/gz153016/article/details/90216737

     Python3 xml解析模块xml.etree.ElementTree简介

    https://blog.csdn.net/asty9000/article/details/93627226?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase

    删除重复xml节点

    https://blog.csdn.net/u014203484/article/details/74332815

    import xml.etree.ElementTree as ET----------导入xml模块

    root = ET.parse('GHO.xml')------------------分析指定xml文件
    tree = root.getroot()-----------------------获取第一标签
    data = tree.find('Data')--------------------查找第一标签中'Data'标签
    for obs in data:----------------------------历遍'Data'中的所有标签
    for item in obs:------------------------历遍'Data'中的'obs'标签下的所有标签
    key = item.attrib()-----------------提取key值参数
    print(list(key))--------------------输出key值 

    如何读取属性及节点内容。

    怎样将data中的 id,name及其值取出来?

    问题解释

    两种方式:
    1.先取得node
    String strID = node.getAttributes().getNamedItem("id").getNodeValue();
    String strName = node.getAttributes().getNamedItem("name").getNodeValue();
    2.先取得element
    String strID = element.getAttribute("id");
    String strName = element.getAttribute("name");

    小练习

    #!/usr/bin/env python
    import sys
    import xml.etree.ElementTree as ET
    
    tree = ET.parse('abcdefg.xml')
    root = tree.getroot()
    
    iter_elem = root.findall('.//*')
    print(len(iter_elem))
    #elem = root.find('')
    #print iter_elem
    for element in iter_elem:
    
        if element is None:
            continue
        if element.text is None:
            continue
        print("hello")
        context=[]	
        src_elem = element.find("source")
        if src_elem is None:
            continue
        context.append(src_elem.text)	
    
        print( "attri :%s"%src_elem.attrib)
        print("tag :%s"%src_elem.tag)		
    
        #for item in src_elem:
    	#    key = item.text()
    	#    print list(key)


    del duplicatd node:

    import xml.etree.ElementTree as ET
    path = 'in.xml'
    tree = ET.parse(path)
    root = tree.getroot()
    prev = None
    
    def elements_equal(e1, e2):
        if type(e1) != type(e2):
            return False
        if e1.tag != e1.tag: return False
        if e1.text != e2.text: return False
        if e1.tail != e2.tail: return False
        if e1.attrib != e2.attrib: return False
        if len(e1) != len(e2): return False
        return all([elements_equal(c1, c2) for c1, c2 in zip(e1, e2)])
    
    for page in root:                     # iterate over pages
        elems_to_remove = []
        for elem in page:
            if elements_equal(elem, prev):
                print("found duplicate: %s" % elem.text)   # equal function works well
                elems_to_remove.append(elem)
                continue
            prev = elem
        for elem_to_remove in elems_to_remove:
            page.remove(elem_to_remove)
    tree.write("out.xml")
    

      

      

  • 相关阅读:
    Equal Cut
    【线段树】Interval GCD
    zookeeper错误
    HBase之过滤器
    Hbase之缓存扫描加快读取速度
    Hbase之遍历超时处理
    Hbase之遍历获取数据
    Hbase之使用回调函数进行批处理操作
    Hbase之进行批处理操作
    Hbase之原子性更新数据
  • 原文地址:https://www.cnblogs.com/7star/p/13307658.html
Copyright © 2011-2022 走看看