原文地址:http://www.blogjava.net/pengpenglin/archive/2010/02/22/313669.html
【GBK转UTF-8】
在很多论坛、网上经常有网友问“ 为什么我使用 new String(tmp.getBytes("ISO-8859-1"), "UTF-8") 或者 new String(tmp.getBytes("ISO-8859-1"), "GBK")可以得到正确的中文,但是使用 new String(tmp.getBytes("GBK"), "UTF-8") 却不能将GBK转换成UTF-8呢?”
参考前面的【Java基础专题】编码与乱码(03)----String的toCharArray()方法测试 一文,我们就知道原因了。因为如果客户端使用GBK、UTF-8编码,编码后的字节经过ISO-8859-1传输,再用原来相同的编码方式进行解码,这个过程是“无损的转换”---- 因为原始和最终的编码方式相同。
但是如果客户端使用GBK编码,到了服务器端要转换成UTF-8,或者相反的过程。想一想,字节还是那些字节,但是编码的规则变了。原来GBK编码后的4个字节要用UTF-8的每个字符3个字节的规则编码,怎么能不乱码呢?
所以从现在开始,不要再犯这种错误了。new String(tmp.getBytes("GBK"), "UTF-8") 这个过程,JVM内部是不会帮你自动对字节进行扩展以适应UTF-8的编码的。正确的方法应该是根据UTF-8的编码规则进行字节的扩充,即手动从2个字节变成3个字节,然后再转换成十六进制的UTF-8编码。
在这个专题的第一篇文章【Java基础专题】编码与乱码(01)---编码基础 开头,我们就已经介绍了这个规则:
①得到每个字符的2进制GBK编码
②将该16进制的GBK编码转换成2进制的字符串(2个字节)
③分别在字符串的首位插入110,在第9位插入10,在第17位插入10三个字符串,得到3个字节
④将这3个字节分别转换成16进制编码,得到最终的UTF-8编码。
下面给出一个从网络上得到的Java转码方法,原文链接见:http://jspengxue.javaeye.com/blog/40781。下面的代码做了小小的修改
![](http://www.blogjava.net/Images/OutliningIndicators/None.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/None.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/InBlock.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
![](http://www.blogjava.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
最终的测试结果是正确的:string from GBK to UTF-8 byte: 中文。
但是这个方法并不是完美的!要知道这个规则只对中文起作用,如果传入的字符串中包含有单字节字符,如a+3中文,那么解析的结果就变成:string from GBK to UTF-8 byte: ?????????中文了。为什么呢?道理很简单,这个方法对原本在UTF-8中应该用单字节表示的数字、英文字符、符号都变成3个字节了,所以这里有9个?,代表被转换后的a、+、3字符。
所以要让这个方法更加完美,最好的方法就是加入对字符Unicode区间的判断
UCS-2编码(16进制) | UTF-8 字节流(二进制) |
0000 - 007F | 0xxxxxxx |
0080 - 07FF | 110xxxxx 10xxxxxx |
0800 - FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
汉字的Unicode编码范围为u4E00-u9FA5 uF900-uFA2D,如果不在这个范围内就不是汉字了。
【UTF-8转GBK】
道理和上面的相同,只是一个逆转的过程,不多说了
但是最终的建议还是:能够统一编码就统一编码吧!要知道编码的转换是相当的耗时的工作