zoukankan      html  css  js  c++  java
  • 字符串编码探测、转换的使用

    由于工作的关系,常常需要获取某段字符串的编码信息,防止乱码出现。在网上找了一下,有下面几个库

    1.C#

    https://code.google.com/p/ude/  探测库

     Ude is a C# port of Mozilla Universal Charset Detector.
        The original source code is available at:
        http://mxr.mozilla.org/mozilla/source/extensions/universalchardet/src/
        http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
        http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/doc/UniversalCharsetDetection.doc

    2.Java

         http://code.google.com/p/juniversalchardet/

    3.Python
         http://chardet.feedparser.org/

    4.C++

    IBM有一个开源库ICU,http://site.icu-project.org/  转换

    Linux

    enca:         http://freecode.com/projects/enca 探测和转换库

    Mozilla 编码的c++版本

    http://code.google.com/p/uchardet/  探测库

    参考

    http://blog.csdn.net/xian0617/article/details/6706107

    https://www.byvoid.com/blog/tag/mozilla

    http://www.linuxidc.com/Linux/2011-05/35769.htm

    http://blog.csdn.net/wangyonggang/article/details/927

    enca,uchardet,ICU,ude,

    -------------------

    Import the library:

            using Ude;

        and feed a stream or a byte array to the detector. Call DataEnd to notify the detector that
        you want back the result:
             
            ICharsetDetector cdet = new CharsetDetector();
            byte[] buff = new byte[1024];
            int read;
            while ((read = stream.Read(buff, 0, buff.Length)) > 0 && !done) {
                cdet.Feed(buff, 0, read);
            }
            cdet.DataEnd();
            Console.WriteLine("Charset: {0}, confidence: {1}, cdet.Charset, cdet.Confidence);


        Alternatively, you can feed a Stream to the detector:

            using (FileStream fs = File.OpenRead(filename)) {
                ICharsetDetector cdet = new CharsetDetector();
                cdet.Feed(fs);
                cdet.DataEnd();
                Console.WriteLine("Charset: {0}, confidence: {1}, cdet.Charset, cdet.Confidence);
            }   

  • 相关阅读:
    linux 学习之七-部分ssh命令
    揭秘淘宝286亿海量图片存储与处理架构
    大型网站后台架构的演变
    Linux学习之六-Yum命令的使用
    res://ieframe.dll/acr_error.htm 纯手动解决方法
    VS2013启动项目调试的时候会启动本地IIS
    linux学习之(六)-主机名、网络IP的配置与查看
    deployment.yaml 带同步时区
    Deployment
    mysql.yaml
  • 原文地址:https://www.cnblogs.com/simfe/p/2891564.html
Copyright © 2011-2022 走看看