1.Java使用dom4j读取xml时报错:
org.dom4j.DocumentException: Error on line 2 of document : Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-byte UTF-8 sequence
2.错误原因:
接口返回的诗句编码是GBK,而我们代码中虽然指定了InputSource的编码,但是没有对解析xml时进行编码。
3.未修改的代码:
SAXReader reader = new SAXReader(false); InputSource source = new InputSource(new ByteArrayInputStream(xml.getBytes())); source.setEncoding("UTF-8"); Document doc = reader.read(source);
4.改后的代码:
SAXReader reader = new SAXReader(false); InputSource source = new InputSource(new ByteArrayInputStream(xml.getBytes("UTF-8"))); source.setEncoding("UTF-8"); Document doc = reader.read(source);
增加了对xml字符串转字节的时候转码操作。
或者使用:
SAXReader reader = new SAXReader(false); Document doc = saxReader.read(new ByteArrayInputStream(xml.getBytes("UTF-8")));