zoukankan      html  css  js  c++  java
  • IO 流读取文件时候出现乱码 文件编码格式问题 怎么转换解决方法

    在使用下面这个写法时候UTF-8文件编码 在读取时候出现乱码问题。

    File myFile=new File("文件路径");

    BufferedReader in = new BufferedReader(new FileReader(myFile));  
    

    应该修改为:

    BufferedReader in = new BufferedReader( new InputStreamReader( new FileInputStream(myFile), "UTF-8") ); 
    

    如果使用INSA编码时候 请使用下面文件读取方式:

    InputStreamReader reader = new InputStreamReader(   new FileInputStream(new File("文件路径")), "gb2312");  
    

    下面是我对文件编码的判断方法:

    /** 
         * 上传文件编码判断 
         * */  
        public static String get_charset(File file) {  
            String charset = "GBK";  
            byte[] first3Bytes = new byte[3];  
            try {  
                boolean checked = false;  
                ;  
                BufferedInputStream bis = new BufferedInputStream(  
                        new FileInputStream(file));  
                bis.mark(0);  
                int read = bis.read(first3Bytes, 0, 3);  
                if (read == -1)  
                    return charset;  
                if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {  
                    charset = "UTF-16LE";  
                    checked = true;  
                } else if (first3Bytes[0] == (byte) 0xFE  
                        && first3Bytes[1] == (byte) 0xFF) {  
                    charset = "UTF-16BE";  
                    checked = true;  
                } else if (first3Bytes[0] == (byte) 0xEF  
                        && first3Bytes[1] == (byte) 0xBB  
                        && first3Bytes[2] == (byte) 0xBF) {  
                    charset = "UTF-8";  
                    checked = true;  
                }  
                bis.reset();  
                if (!checked) {  
                    // int len = 0;  
                    int loc = 0;  
      
                    while ((read = bis.read()) != -1) {  
                        loc++;  
                        if (read >= 0xF0)  
                            break;  
                        if (0x80 <= read && read <= 0xBF) // 单独出现BF以下的,也算是GBK  
                            break;  
                        if (0xC0 <= read && read <= 0xDF) {  
                            read = bis.read();  
                            if (0x80 <= read && read <= 0xBF) // 双字节 (0xC0 - 0xDF)  
                                // (0x80  
                                // - 0xBF),也可能在GB编码内  
                                continue;  
                            else  
                                break;  
                        } else if (0xE0 <= read && read <= 0xEF) {// 也有可能出错,但是几率较小  
                            read = bis.read();  
                            if (0x80 <= read && read <= 0xBF) {  
                                read = bis.read();  
                                if (0x80 <= read && read <= 0xBF) {  
                                    charset = "UTF-8";  
                                    break;  
                                } else  
                                    break;  
                            } else  
                                break;  
                        }  
                    }  
      
                }  
      
                bis.close();  
            } catch (Exception e) {  
                e.printStackTrace();  
            }  
      
            return charset;  
        }  
    

    调用时候判断编码方式UTF-8 或是 INSA编码:

    BufferedReader br = null;  
                if (charset == "GBK") {  
                    InputStreamReader reader = new InputStreamReader(  
                            new FileInputStream(new File(filepath)), "gb2312");  
                    br = new BufferedReader(reader);  
                }  
                if (charset == "UTF-8") {  
                    br = new BufferedReader(new InputStreamReader(  
                            new FileInputStream(filepath), "UTF-8"));  
                }  
    
  • 相关阅读:
    bzoj 2002 [Hnoi2010]Bounce 弹飞绵羊
    【无图慎入】Link Cut Tree 总结
    cogs1889 [SDOI2008]Cave 洞穴勘测 link-cut tree
    Codeforces Round #452 (Div. 2)
    【正经向】NOIP2017烤后总结
    cogs1772 [国家集训队2010]小Z的袜子
    noip2017普及题解
    noip2017 TG 游记
    noip2017 PJ AK记
    jzoj5341 捕老鼠
  • 原文地址:https://www.cnblogs.com/ylzhang/p/7885265.html
Copyright © 2011-2022 走看看