zoukankan      html  css  js  c++  java
  • python 处理xml 遇到特殊符号解析错误的情况

    global:[2017-06-05 10:27:48.662313] [DEBUG] 输出fmsg_content <msg fromusername="li2571" encryptusername="v1_d6685823c361fcaabb8f8bdde7b1c69831047cc4012ca73af16a452688fcb1ec@stranger" fromnickname="돌아갈 수 있는 지난จุ๊บðŸ’" content="我是" fullpy="?????????????" shortpy="?????????????" imagestatus="3" scene="3" country="CN" province="Hubei" city="Jingzhou" sign="〢铷裹爱,请深爱.、.|‖|‖▍.*" percard="1" sex="2" alias="LiLizouli713" weibo="" weibonickname="" albumflag="0" albumstyle="0" albumbgimgid="" snsflag="17" snsbgimgid="http://szmmsns.qpic.cn/mmsns/aicQlel8roa2oJPnj8q8Gf1ibVnDX1x5HD23xde644eAP8x0E5qtm69hGQ5e6GOquAkiaku39cAte8/0" snsbgobjectid="12548372440867024976" mhash="3247e9c6ea7921d63e672c5ede4e206e" mfullhash="3247e9c6ea7921d63e672c5ede4e206e" bigheadimgurl="http://wx.qlogo.cn/mmhead/ver_1/kjW4HogEYibLpboXT4mUDTUV9BhRnXAt0C4DW7JvQUY3Tia8yF8ibBBGF7wRv9vaaFdFcLne8GybjLlsVaTrrKNrP2Zjjlxtp9vGKEdcgCiaB44/0" smallheadimgurl="http://wx.qlogo.cn/mmhead/ver_1/kjW4HogEYibLpboXT4mUDTUV9BhRnXAt0C4DW7JvQUY3Tia8yF8ibBBGF7wRv9vaaFdFcLne8GybjLlsVaTrrKNrP2Zjjlxtp9vGKEdcgCiaB44/96" ticket="v2_8655444fac8ef7e3a277aeee973c6038a97e83593c53559e1d362e7488fda8c65724aa1310015415fd681faa284b5b18@stranger" opcode="2" googlecontact="" qrticket="" chatroomusername="" sourceusername="" sourcenickname=""><brandlist count="0" ver="652744432"></brandlist></msg>
    global:[2017-06-05 10:27:48.662493] [DEBUG] shuchuadezhi .、.     #这里是 fmsg_content[300:305]
    global:[2017-06-05 10:27:48.662711] [ERROR] process_wechat_msg not well-formed (invalid token): line 1, column 301

    import xml.etree.cElementTree as ET

    xml_tree = ET.fromstring(fmsg_content)

    运行报错

    global:[2017-06-05 10:27:48.662711] [ERROR] process_wechat_msg not well-formed (invalid token): line 1, column 301

    输出日志 

     fmsg_content[300:305]  得到的是特殊符号 .、.

    先尝试:

                # parser = ET.XMLParser(encoding='utf-8')

                # xml_tree = ET.fromstring(fmsg_content, parser=parser)

    得到 

    *** Error in `/usr/bin/python3': double free or corruption (!prev): 0x00000000012ae500 ***

    大坑。。

    最终

    fmsg_content=re.sub(u"[x00-x08x0b-x0cx0e-x1f]+",u"",fmsg_content)

    xml_tree = ET.fromstring(fmsg_content)

    替换掉非法字符 就不会报错了

  • 相关阅读:
    作业七随笔。。
    Jquery 图片走马灯效果原理
    参与招聘面试工作之简历与仪容篇
    无聊系列 C#中一些常用类型与java的类型对应关系
    关于ASP.NET MVC 中JsonResult返回的日期值问题
    最近参与招聘面试的工作总结
    Unix时间戳转日期时间格式,C#、Java、Python各语言实现!
    MVC 拦截器
    Python参考书籍(转载)
    PEP 8风格指南(转载)
  • 原文地址:https://www.cnblogs.com/bevis-blog/p/6944248.html
Copyright © 2011-2022 走看看