zoukankan      html  css  js  c++  java
  • ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

    出现这个错,是因为编码的问题。

    Traceback (most recent call last):
      File "/tmp/a.py", line 4, in <module>
        html5lib.parse('<p>&#1;', treebuilder='lxml')
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 28, in parse
        return p.parse(doc, encoding=encoding)
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 224, in parse
        parseMeta=parseMeta, useChardet=useChardet)
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 93, in _parse
        self.mainLoop()
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 183, in mainLoop
        new_token = phase.processCharacters(new_token)
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 991, in processCharacters
        self.tree.insertText(token["data"])
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/treebuilders/_base.py", line 320, in insertText
        parent.insertText(data)
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/treebuilders/etree_lxml.py", line 240, in insertText
        builder.Element.insertText(self, data, insertBefore)
      File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/treebuilders/etree.py", line 108, in insertText
        self._element.text += data
      File "lxml.etree.pyx", line 921, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:41467)
      File "apihelpers.pxi", line 652, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:18888)
      File "apihelpers.pxi", line 1335, in lxml.etree._utf8 (src/lxml/lxml.etree.c:24701)
    ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

    再生成文档过程中,突然间发现出现此错误。本来想着是通过改变编码的方式,来解决这类问题,如下所示:

    p = document.add_paragraph(u"哈哈 ")
    或者是:
    p = document.add_paragraph(p.encode('utf-8').decode("utf-8")) 

    但是我使用了上述的两种方法,错误仍然存在,后面就用了替换的方法,解决了眼前的错误(虽然目前妥协了,但是后面如果发现又更好的解决方式,会再来更新的):

    s = re.sub(u"[\x00-\x08\x0b\x0e-\x1f\x7f]", "", s)
    p = self.doc.add_paragraph(s)
  • 相关阅读:
    java 大数据处理类 BigDecimal 解析
    关于纠正 C/C++ 之前在函输内改变 变量的一个错误想法。
    C++ 制作 json 数据 并 传送给服务端(Server) 的 php
    介绍一个很爽的 php 字符串特定检索函数---strpos()
    如何 判断 设备 是否 连接 上 了 wifi
    android 通过访问 php 接受 or 传送数据
    正则匹配抓取input 隐藏输入项和 <td>标签内的内容
    手把手教你Chrome扩展开发:本地存储篇
    HTML5之本地存储localstorage
    初尝CDN:什么是分布式服务节点?
  • 原文地址:https://www.cnblogs.com/lxz123/p/15015557.html
Copyright © 2011-2022 走看看