zoukankan      html  css  js  c++  java
  • 使用BeautifulSoup解析XML文档

    有200多个XML文档,每个文档类似如下:

    <?xml version="1.0"?>
    <VehicleInfo>
      <FileHeader>
        <ScaleInfo>
          <SN>H00120030101081526</SN>
          <UserName>盛隆钢铁</UserName>
          <ScaleName>2#</ScaleName>
          <ScaleID>H001</ScaleID>
          <ScaleType>铁水秤开关</ScaleType>
          <WeighingType>铁水秤开关</WeighingType>
          <MeasureTime>2003-01-01 08:15:26</MeasureTime>
          <NodeNumber>2</NodeNumber>
          <WaveFile>20030101081424.wave</WaveFile>
          <VideoFile>20030101081424.wave</VideoFile>
          <Orientation>右方向来车&lt;&lt;&lt;&lt;&lt;&lt;</Orientation>
          <OperatorName>Admin</OperatorName>
          <SUMWeight>0</SUMWeight>
        </ScaleInfo>
      </FileHeader>
      <FileBody>
        <Node>
          <ID>1</ID>
          <_DateTime>2003-1-1 8:14:25</_DateTime>
          <VehicleType />
          <VehicleCardID />
          <Speed>17.5</Speed>
          <Weight>3.12</Weight>
          <FrontAxisWeight>.00</FrontAxisWeight>
          <BackAxisWeight>.00</BackAxisWeight>
          <InsideWheel1>.00</InsideWheel1>
          <OutsideWheel1>.00</OutsideWheel1>
          <InsideWheel2>.00</InsideWheel2>
          <OutsideWheel2>.00</OutsideWheel2>
          <InsideWheel3>.00</InsideWheel3>
          <OutsideWheel3>.00</OutsideWheel3>
          <InsideWheel4>.00</InsideWheel4>
          <OutsideWheel4>.00</OutsideWheel4>
          <Temperature>0123</Temperature>
          <Humidity>0123</Humidity>
          <PIC1>_1.bmp</PIC1>
          <PIC2>_2.bmp</PIC2>
          <PIC3>_3.bmp</PIC3>
          <PIC4>_4.bmp</PIC4>
        </Node>
        <Node>
          <ID>2</ID>
          <_DateTime>2003-1-1 8:14:26</_DateTime>
          <VehicleType />
          <VehicleCardID />
          <Speed>15.8</Speed>
          <Weight>4.77</Weight>
          <FrontAxisWeight>.00</FrontAxisWeight>
          <BackAxisWeight>.00</BackAxisWeight>
          <InsideWheel1>.00</InsideWheel1>
          <OutsideWheel1>.00</OutsideWheel1>
          <InsideWheel2>.00</InsideWheel2>
          <OutsideWheel2>.00</OutsideWheel2>
          <InsideWheel3>.00</InsideWheel3>
          <OutsideWheel3>.00</OutsideWheel3>
          <InsideWheel4>.00</InsideWheel4>
          <OutsideWheel4>.00</OutsideWheel4>
          <Temperature>0123</Temperature>
          <Humidity>0123</Humidity>
          <PIC1>_1.bmp</PIC1>
          <PIC2>_2.bmp</PIC2>
          <PIC3>_3.bmp</PIC3>
          <PIC4>_4.bmp</PIC4>
        </Node>
      </FileBody>
    </VehicleInfo>

    现在要提取MeasureTime、NodeNumber、Orientation以及每个Node下面的Weight,最后计算左方向和右方向总次数和总节数,以及每个方向的总重与差。如果使用C#,代码不知道要多长,那就用Python吧~

    #!/usr/bin/env python
    #-*- coding:utf-8 -*-
    __author__ = 'liulixiang'
    
    from bs4 import BeautifulSoup
    import glob
    
    left, left_times, left_weight = 0, 0, 0.0
    right, right_times, right_weight = 0, 0, 0.0
    files = sorted(glob.glob(r'E:工作work-documents2013凤矿计量系统DebugWY.WeightBridge.Data*.xml'))
    for index, filename in enumerate(files, 1):
        file = open(filename, encoding='utf-8').read()
        soup = BeautifulSoup(file, 'xml')
        print(index,  '时间', soup.MeasureTime.string, '节数:', int(soup.NodeNumber.string), '方向:', soup.Orientation.string)
        for node in soup.FileBody.findChildren('Node'):
            print('	序号:', node.ID.string, '重量:', node.Weight.string)
            if soup.Orientation.string == '左方向来车>>>>>>':
                left_weight += float(node.Weight.string)
            elif soup.Orientation.string == '右方向来车<<<<<<':
                right_weight += float(node.Weight.string)
        if soup.Orientation.string == '左方向来车>>>>>>':
            left += int(soup.NodeNumber.string)
            left_times += 1
        elif soup.Orientation.string == '右方向来车<<<<<<':
            right += int(soup.NodeNumber.string)
            right_times += 1
            print('
    ')
    
    print('左方向来车共{}次,共{}节,总皮重{:.2f}'.format(left_times, left, left_weight))
    print('右方向来车共{}次,共{}节, 总毛重{:.2f}'.format(right_times, right, right_weight))
    print('总净重:%.2f' % (right_weight - left_weight))

    注意:

    1、soup = BeautifulSoup(file, 'xml'),因为BeautifulSoup默认解析HTML,所以解析XML时需要声明。

    2、BS解析XML依赖lxml,windows下可以到这里下载二进制版本的lxml库。

    3、BS的children()返回的是NavigableString,用findChildren可以返回tag。

    这世上诱惑(五花八门的编程语言)太多,请抵制诱惑,今天这个语言流行(go说你呢)用这个,明天那个语言流行就用那个。人应该驾驭语言,而非语言来驾驭人。自勉!

  • 相关阅读:
    Atitit.500 503 404错误处理最佳实践oak
    Atitit. 解决unterminated string literal 缺失引号
    Atitit. Java script 多重多重catch语句的实现and Javascript js 异常机制
    Atitit. Dwr 抛出异常error解决方案
    Atitit.js javascript异常处理机制与java异常的转换.js exception process Voae
    Atitit.软件gui按钮and面板---通讯子系统(区)-- github 的使用....
    atitit。gui 界面皮肤以及换肤总结 java .net c++
    atitit.软件开发GUI 布局管理优缺点总结java swing wpf web html c++ qt php asp.net winform
    atitit.报表最佳实践oae 与报表引擎选型
    Atitit. 软件---多媒体区---- jmf 2.1.1 Java Media Framework 支持的格式
  • 原文地址:https://www.cnblogs.com/liulixiang/p/3530888.html
Copyright © 2011-2022 走看看