zoukankan      html  css  js  c++  java
  • IO 流读取文件时候出现乱码 文件编码格式问题 怎么转换解决方法

    在使用下面这个写法时候UTF-8文件编码 在读取时候出现乱码问题。

    File myFile=new File("文件路径");

    BufferedReader in = new BufferedReader(new FileReader(myFile));  
    

    应该修改为:

    BufferedReader in = new BufferedReader( new InputStreamReader( new FileInputStream(myFile), "UTF-8") ); 
    

    如果使用INSA编码时候 请使用下面文件读取方式:

    InputStreamReader reader = new InputStreamReader(   new FileInputStream(new File("文件路径")), "gb2312");  
    

    下面是我对文件编码的判断方法:

    /** 
         * 上传文件编码判断 
         * */  
        public static String get_charset(File file) {  
            String charset = "GBK";  
            byte[] first3Bytes = new byte[3];  
            try {  
                boolean checked = false;  
                ;  
                BufferedInputStream bis = new BufferedInputStream(  
                        new FileInputStream(file));  
                bis.mark(0);  
                int read = bis.read(first3Bytes, 0, 3);  
                if (read == -1)  
                    return charset;  
                if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {  
                    charset = "UTF-16LE";  
                    checked = true;  
                } else if (first3Bytes[0] == (byte) 0xFE  
                        && first3Bytes[1] == (byte) 0xFF) {  
                    charset = "UTF-16BE";  
                    checked = true;  
                } else if (first3Bytes[0] == (byte) 0xEF  
                        && first3Bytes[1] == (byte) 0xBB  
                        && first3Bytes[2] == (byte) 0xBF) {  
                    charset = "UTF-8";  
                    checked = true;  
                }  
                bis.reset();  
                if (!checked) {  
                    // int len = 0;  
                    int loc = 0;  
      
                    while ((read = bis.read()) != -1) {  
                        loc++;  
                        if (read >= 0xF0)  
                            break;  
                        if (0x80 <= read && read <= 0xBF) // 单独出现BF以下的,也算是GBK  
                            break;  
                        if (0xC0 <= read && read <= 0xDF) {  
                            read = bis.read();  
                            if (0x80 <= read && read <= 0xBF) // 双字节 (0xC0 - 0xDF)  
                                // (0x80  
                                // - 0xBF),也可能在GB编码内  
                                continue;  
                            else  
                                break;  
                        } else if (0xE0 <= read && read <= 0xEF) {// 也有可能出错,但是几率较小  
                            read = bis.read();  
                            if (0x80 <= read && read <= 0xBF) {  
                                read = bis.read();  
                                if (0x80 <= read && read <= 0xBF) {  
                                    charset = "UTF-8";  
                                    break;  
                                } else  
                                    break;  
                            } else  
                                break;  
                        }  
                    }  
      
                }  
      
                bis.close();  
            } catch (Exception e) {  
                e.printStackTrace();  
            }  
      
            return charset;  
        }  
    

    调用时候判断编码方式UTF-8 或是 INSA编码:

    BufferedReader br = null;  
                if (charset == "GBK") {  
                    InputStreamReader reader = new InputStreamReader(  
                            new FileInputStream(new File(filepath)), "gb2312");  
                    br = new BufferedReader(reader);  
                }  
                if (charset == "UTF-8") {  
                    br = new BufferedReader(new InputStreamReader(  
                            new FileInputStream(filepath), "UTF-8"));  
                }  
    
  • 相关阅读:
    POJ 1611 The Suspects
    POJ 2001 Shortest Prefixes(字典树)
    HDU 1251 统计难题(字典树 裸题 链表做法)
    G++ C++之区别
    PAT 乙级 1013. 数素数 (20)
    PAT 乙级 1012. 数字分类 (20)
    PAT 乙级 1009. 说反话 (20)
    PAT 乙级 1008. 数组元素循环右移问题 (20)
    HDU 6063 17多校3 RXD and math(暴力打表题)
    HDU 6066 17多校3 RXD's date(超水题)
  • 原文地址:https://www.cnblogs.com/ylzhang/p/7885265.html
Copyright © 2011-2022 走看看