我们的客户端程序直接调用es 的restful接口, 通过post json数据去查询, 但post数据有中文的时候,有些中文会报异常,有些中文不会
{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x5c at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@58cf272c; line: 1, column: 238]"}],"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x5c at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@58cf272c; line: 1, column: 238]"},"status":500}
而通过es head插件去post 同样的json数据,却运行正常, 初步判断写数据的时候有问题, 上代码
URL url = new URL(esURL); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setDoOutput(true); connection.setDoInput(true); connection.setRequestMethod("POST"); connection.setUseCaches(false); //connection.setConnectTimeout(30000);// 超时时间设置为30秒 connection.setInstanceFollowRedirects(true); connection.setRequestProperty("Charsert", "UTF-8"); connection.setRequestProperty("Content-Type", "application/json; charset=UTF-8"); connection.setRequestProperty("Accept-Language", "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3"); connection.connect(); // POST请求 DataOutputStream out = new DataOutputStream(connection.getOutputStream()); out.writeBytes(query);
问题就出在wirteBytes()方法里,我们看JDK源代码
public final void writeBytes(String s) throws IOException { int len = s.length(); for (int i = 0 ; i < len ; i++) { out.write((byte)s.charAt(i)); } incCount(len); }
我们知道UTF8编码里一个中文用3个字节来存储,而这里是直接把一个中文强制转一个byte, 这样肯定会有问题的
修改代码成
out.write(query.getBytes("UTF-8"));
问题解决