zoukankan      html  css  js  c++  java
  • PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

    PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

    参考资料:https://github.com/mstamy2/PyPDF2/issues/438

    使用 PyPDF2 做合并 PDF 文件时报错如下:

    Traceback (most recent call last):
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 484, in readFromStream
        return NameObject(name.decode('utf-8'))
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "D:projectsmyprojectappsackstageviewsusi_contract_manage_view.py", line 703, in post
        merge_pdf_result = merge_pdf(final_files, pdf_path)
      File "D:projectsmyprojectappsutilsdoc_convert_util.py", line 86, in merge_pdf
        pdf_writer.write(new_file)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 482, in write
        self._sweepIndirectReferences(externalReferenceMap, self._root)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 571, in _sweepIndirectReferences
        self._sweepIndirectReferences(externMap, realdata)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 571, in _sweepIndirectReferences
        self._sweepIndirectReferences(externMap, realdata)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 556, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, data[i])
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 571, in _sweepIndirectReferences
        self._sweepIndirectReferences(externMap, realdata)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 577, in _sweepIndirectReferences
        newobj = data.pdf.getObject(data)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 1611, in getObject
        retval = readObject(self.stream, self)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 66, in readObject
        return DictionaryObject.readFromStream(stream, pdf)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 579, in readFromStream
        value = readObject(stream, pdf)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 60, in readObject
        return NameObject.readFromStream(stream, pdf)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 492, in readFromStream
        raise utils.PdfReadError("Illegal character in Name Object")
    PyPDF2.utils.PdfReadError: Illegal character in Name Object
    

     找到对应的报错文件 

     File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 484 

    第484行 原代码:

    try:
        return NameObject(name.decode('utf-8'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        # Name objects should represent irregular characters
        # with a '#' followed by the symbol's hex number
        if not pdf.strict:
            warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
            return NameObject(name)
        else:
            raise utils.PdfReadError("Illegal character in Name Object")

    在 except 中加入代码 

     return NameObject(name.decode('gbk')) 

    修改后

    try:
        return NameObject(name.decode('utf-8'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        try:
            return NameObject(name.decode('gbk'))
        except (UnicodeEncodeError, UnicodeDecodeError) as e:
            # Name objects should represent irregular characters
            # with a '#' followed by the symbol's hex number
            if not pdf.strict:
                warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
                return NameObject(name)
            else:
                raise utils.PdfReadError("Illegal character in Name Object")

    修改后仍会报错,需要修改修改另一处

    Lib/site-packages/PyPDF2/utils.py 第238行

    原代码

    r = s.encode('latin-1')
    if len(s) < 2:
        bc[s] = r
    return r

    修改后代码:

    try:
        r = s.encode('latin-1')
    except Exception as e:
        r = s.encode('utf-8')
    if len(s) < 2:
        bc[s] = r
    return r

    出处:https://blog.csdn.net/kmesky/article/details/102695520

  • 相关阅读:
    Maven 集成Tomcat插件
    dubbo 序列化 问题 属性值 丢失 ArrayList 解决
    docker 中安装 FastDFS 总结
    docker 从容器中拷文件到宿主机器中
    db2 相关命令
    Webphere WAS 启动
    CKEDITOR 4.6.X 版本 插件 弹出对话框 Dialog中 表格 Table 自定义样式Style 问题
    SpringMVC JSONP JSON支持
    CKEDITOR 3.4.2中 按钮事件中 动态改变图标和title 获取按钮
    git回退到远程某个版本
  • 原文地址:https://www.cnblogs.com/mysick/p/12726582.html
Copyright © 2011-2022 走看看