zoukankan      html  css  js  c++  java
  • Java字符编码问题

    今天研究了一下,记录下来

    中间用的是redis,可以使用任意其他的io替代,一样的

    Test1

            String s1 = "我要测试";

            String s2 = "I want to test";

            String s3 = "경쟁력, 네이버";

            redis.lpush("testencode", s1);

            redis.lpush("testencode", s2);

            redis.lpush("testencode", s3);

            System.out.println(redis.lpop("testencode"));

            System.out.println(redis.lpop("testencode"));

            System.out.println(redis.lpop("testencode"));

    结果:全部正确

    注解:Java内部也是unicode,所以如果发送和接受端都是Java写的,无需任何转码(前提是发送和接受端的默认编码一致)

            Java在往I/O发送和从I/O接受的时候会默认转码,一般用系统默认的编码,貌似文档本身的编码格式优先级更高

            所以这里发送到时候转成utf-8,接受时再从utf-8转回unicode,所以没有问题

    Test2

            String s1 = "我要测试";

            byte[] key = "testencode".getBytes();

            byte[] b1 =  s1.getBytes("gb2312"); //自己转码,而非默认转码

            redis.lpush(key, b1);

            System.out.println(new String(redis.lpop(key),"gb2312"));

            //System.out.println(new String(redis.lpop(key)));

    结果:正确

    注解:由于发送的时候已经转成gb2312,所以接受的时候,必须转回来,如果用默认的(注释掉部分)就会转成默认编码utf-8,就会乱码

    前面的转码都是在知道原编码的情况下,但有时在接收端无法知道原来的编码,这是就需要detect编码

    使用JCharDet,这个的接口写的不好,蛮难用的

    参考,http://blog.csdn.net/chenvsa/article/details/7445569

    我改了一下,

    import org.mozilla.intl.chardet.nsDetector;
    import org.mozilla.intl.chardet.nsICharsetDetectionObserver;
    import org.mozilla.intl.chardet.nsPSMDetector;

    public class CharsetDetector{
        private boolean found = false;
        private String result;
        private int lang = nsPSMDetector.ALL;

        public String[] detectCharset(byte[] bytes) throws IOException
        {
            String[] prob;
            // Initalize the nsDetector() ;
            nsDetector det = new nsDetector(lang);
            // Set an observer...
            // The Notify() will be called when a matching charset is found.
            det.Init(
                new nsICharsetDetectionObserver(){   
                    public void Notify(String charset)
                    {
                        found = true;
                        result = charset;
                    }
                });
            int len = bytes.length;
            boolean isAscii = true;
            if (isAscii){
                isAscii = det.isAscii(bytes, len);
            }
            // DoIt if non-ascii and not done yet.
            if (!isAscii){
                if (det.DoIt(bytes, len, false));                  
            }
            det.DataEnd();
            if (isAscii){
                found = true;
                prob = new String[] {"ASCII"};
            } else if (found){
                prob = new String[] {result};
            } else {
                prob = det.getProbableCharsets();
            }
            return prob;
        }

        public String[] detectChineseCharset(byte[] bytes) throws IOException
        {
            try{
                lang = nsPSMDetector.CHINESE;
                return detectCharset(bytes);
            } catch (IOException e){
                throw e;
            }
        }

    使用,

    CharsetDetector cd = new CharsetDetector();
    String[] probableSet = {};

    try {
         probableSet = cd.detectChineseCharset(b1);
    } catch (IOException e) {
         e.printStackTrace();
    }
    for (String charset : probableSet)
    {
        System.out.println(charset);
    }

  • 相关阅读:
    SAP OPEN UI5 Step 8: Translatable Texts
    SAP OPEN UI5 Step7 JSON Model
    SAP OPEN UI5 Step6 Modules
    SAP OPEN UI5 Step5 Controllers
    SAP OPEN UI5 Step4 Xml View
    SAP OPEN UI5 Step3 Controls
    SAP OPEN UI5 Step2 Bootstrap
    SAP OPEN UI5 Step1 环境安装和hello world
    2021php最新composer的使用攻略
    Php使用gzdeflate和ZLIB_ENCODING_DEFLATE结果gzinflate报data error
  • 原文地址:https://www.cnblogs.com/fxjwind/p/3728283.html
Copyright © 2011-2022 走看看