zoukankan      html  css  js  c++  java
  • xml读取异常Invalid byte 1 of 1-byte UTF-8 sequence

    问题的根源是:

    The cause of this is a file that is not UTF-8 is being parsed as UTF-8. It is likely that the parser is encountering a byte value in the range FE-FF. These values are invalid in the UTF-8 encoding.

    说简单点,当你解析别人的xml格式出现这个错误可能就是别人在生成xml时没有保存为utf-8的字符编码格式。

    在中文版的window下java的默认的编码为GBK,也就是所虽然我们标识了要将xml保存为utf-8格式但实际上文件是以GBK格式来保存的,所以这也就是为什么能够我们使用GBK、GB2312编码来生成xml文件能正确的被解析,而以UTF-8格式生成的文件不能被xml解析器所解析的原因。

    xml解析时遇到的编码异常:

    org.dom4j.DocumentException: Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.
    	at org.dom4j.io.SAXReader.read(SAXReader.java:484)
    	at org.dom4j.io.SAXReader.read(SAXReader.java:321)
    	at com.dataoperate.PaseXml.pXml(PaseXml.java:28)
    	at com.dataoperate.JdbcOp.insertDb(JdbcOp.java:30)
    	at com.dataoperate.JdbcOp.main(JdbcOp.java:89)
    Nested exception: 
    com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
    	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
    	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
    	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:487)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2687)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
    	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
    	at org.dom4j.io.SAXReader.read(SAXReader.java:465)
    	at org.dom4j.io.SAXReader.read(SAXReader.java:321)
    	at com.dataoperate.PaseXml.pXml(PaseXml.java:28)
    	at com.dataoperate.JdbcOp.insertDb(JdbcOp.java:30)
    	at com.dataoperate.JdbcOp.main(JdbcOp.java:89)
    Nested exception: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
    	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
    	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
    	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:487)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2687)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
    	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
    	at org.dom4j.io.SAXReader.read(SAXReader.java:465)
    	at org.dom4j.io.SAXReader.read(SAXReader.java:321)
    	at com.dataoperate.PaseXml.pXml(PaseXml.java:28)
    	at com.dataoperate.JdbcOp.insertDb(JdbcOp.java:30)
    	at com.dataoperate.JdbcOp.main(JdbcOp.java:89)

    1、最简单就是把<?xml version="1.0" encoding="UTF-8"?>改成<?xml version="1.0" encoding="gbk"?>(规避方案)

    2、或者把xml打开另存的时候把字符集改为UTF-8后保存(推荐)

    3、在代码解析的时候先把xml重新写一遍

     SAXReader reader = new SAXReader();  
      org.dom4j.Document document = reader.read("D:\ha.xml");  
      OutputFormat of = new OutputFormat();  
      of.setEncoding("UTF-8"); //改变编码方式  
      XMLWriter writer = new XMLWriter(new FileWriter "d:\dom4j.xml"), of);  

    4、直接dom4j读取的时候用io来读,修改字符编码

    FileInputStream in = new FileInputStream(new File(fileName));
    Reader read = new InputStreamReader(in,"gbk");
    Document document = reader.read(read);
  • 相关阅读:
    C#中 @ 的用法
    ASP.NET页面间传值
    ASP.NET中常用的文件上传下载方法
    把图片转换为字符
    把图片转换为字符
    JavaScript 时间延迟
    Using WSDLs in UCM 11g like you did in 10g
    The Definitive Guide to Stellent Content Server Development
    解决RedHat AS5 RPM安装包依赖问题
    在64位Windows 7上安装Oracle UCM 10gR3
  • 原文地址:https://www.cnblogs.com/yefengmeander/p/3617865.html
Copyright © 2011-2022 走看看