zoukankan      html  css  js  c++  java
  • java 检测字符串中文乱码

    1.检测是否为乱码

    public static boolean isMessyCode(String strName) {
            Pattern p = Pattern.compile("\s*|	*|
    *|
    *");
            Matcher m = p.matcher(strName);
            String after = m.replaceAll("");
            String temp = after.replaceAll("\p{P}", "");
            char[] ch = temp.trim().toCharArray();
            float chLength = 0 ;
            float count = 0;
            for (int i = 0; i < ch.length; i++) {
                char c = ch[i];
                if (!Character.isLetterOrDigit(c)) {
                    if (!isChinese(c)) {
                        count = count + 1;
                    }
                    chLength++;
                }
            }
            float result = count / chLength ;
            if (result > 0.4) {
                return true;
            } else {
                return false;
            }
        }

    2.检查字符是否为中文

    private static boolean isChinese(char c) {
            Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
            if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS
                    || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
                    || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
                    || ub == Character.UnicodeBlock.GENERAL_PUNCTUATION
                    || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION
                    || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS) {
                return true;
            }
            return false;
        }

    3.中文转换编码

    public static String toChinese(String msg){
            if(isMessyCode(msg)){
                try {
                    return new String(msg.getBytes("ISO8859-1"), "UTF-8");
                } catch (Exception e) {
                }
            }
            return msg ;
        }

    Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS : 4E00-9FBF:CJK 统一表意符号
    Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS :F900-FAFF:CJK 兼容象形文字 Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A :3400-4DBF:CJK 统一表意符号扩展 A
    CJK的意思是“Chinese,Japanese,Korea”的简写 ,实际上就是指中日韩三国的象形文字的Unicode编码
    Character.UnicodeBlock.GENERAL_PUNCTUATION :2000-206F:常用标点 Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION :3000-303F:CJK 符号和标点 Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS :FF00-FFEF:半角及全角形式

     

     

    Character.isLetter(c):判断字符是否是字母

    Character.isDigit(c):判断字符是否是数字

  • 相关阅读:
    李宏毅机器学习课程---1、机器学习介绍
    尚学python课程---15、python进阶语法
    尚学python课程---14、python中级语法
    尚学python课程---13、python基础语法
    Android4.2.2由于越来越多的物理按键(frameworks)
    ym——Android之ListView性能优化
    我学cocos2d-x (两) 采用Delegate(信托)
    mac提升yosemite后php 扩展修复
    JAVA学习课第五 — IO流程(九)文件分割器合成器
    第11周项目-2.2
  • 原文地址:https://www.cnblogs.com/sagech/p/5177112.html
Copyright © 2011-2022 走看看