zoukankan      html  css  js  c++  java
  • 查看以及改变文件的编码格式

    Linux

    https://www.shellhacks.com/linux-check-change-file-encoding/

    显示

    在某一个目录下,直接执行file *

    $ file *
    chucklu.autoend.js: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators
    custom.css: UTF-8 Unicode text, with CRLF line terminators
    SimpleMemory.css: UTF-8 Unicode text, with CRLF line terminators

    $ file *
    chucklu.autoend.js: HTML document, Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators
    custom.css: UTF-8 Unicode text, with CRLF line terminators
    SimpleMemory.css: UTF-8 Unicode text, with CRLF line terminators

    $ file -bi chucklu.autoend.js
    text/html; charset=utf-8

    $ file -bi custom.css
    text/plain; charset=utf-8

    -b,--brief   Don’t print filename (brief mode)

    -i, --mime   Print filetype and encoding

    修改

    iconv -f utf-16 -t ascii text.txt

    windows

    https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets

    On Windows with Powershell (Jay Bazuzi):

    • PS C:> gc -en utf8 in.txt | Out-File -en ascii out.txt

      (No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.)

    Edit

    Do you mean iso-8859-1 support? Using "String" does this e.g. for vice versa

    gc -en string in.txt | Out-File -en utf8 out.txt
    

    Note: The possible enumeration values are "Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii".

    How to detect the encoding of a file?

    There is a pretty simple way using Firefox. Open your file using Firefox, then View > Character Encoding. Detailed here.

     解答

    Files generally indicate their encoding with a file header. There are many examples here. However, even reading the header you can never be sure what encoding a file is really using.

    For example, a file with the first three bytes 0xEF,0xBB,0xBF is probably a UTF-8 encoded file. However, it might be an ISO-8859-1 file which happens to start with the characters . Or it might be a different file type entirely.

    Notepad++ does its best to guess what encoding a file is using, and most of the time it gets it right. Sometimes it does get it wrong though - that's why that 'Encoding' menu is there, so you can override its best guess.

    For the two encodings you mention:

    • The "UCS-2 Little Endian" files are UTF-16 files (based on what I understand from the info here) so probably start with 0xFF,0xFE as the first 2 bytes. From what I can tell, Notepad++ describes them as "UCS-2" since it doesn't support certain facets of UTF-16.
    • The "UTF-8 without BOM" files don't have any header bytes. That's what the "without BOM" bit means.

    使用ude查看文件编码

    https://www.nuget.org/packages/UDE.CSharp

    https://github.com/errepi/ude

     public void GetEncoding2(string filePath)
            {
                using (FileStream fs = File.OpenRead(filePath))
                {
                    Ude.CharsetDetector cdet = new Ude.CharsetDetector();
                    cdet.Feed(fs);
                    cdet.DataEnd();
                    if (cdet.Charset != null)
                    {
                        Console.WriteLine("Charset: {0}, confidence: {1}",
                            cdet.Charset, cdet.Confidence);
                    }
                    else
                    {
                        Console.WriteLine("Detection failed.");
                    }
                }
            }

    Charset: ASCII, confidence: 1                          file *显示的是 ASCII text, with CRLF line terminators
    Charset: UTF-8, confidence: 0.7525                 file *显示的是UTF-8 Unicode text, with CRLF line terminators
    Charset: gb18030, confidence: 0.99                file *显示的是ISO-8859 text, with CRLF line terminators

    读取文件前4个字节

     public string GetEncoding(string filePath)
            {
                var bom = new byte[4];
                using (var file = new FileStream(filePath, FileMode.Open, FileAccess.Read))
                {
                    file.Read(bom, 0, 4);
                }
    
                var str = string.Join(" ", bom.Select(x => x.ToString("X2")));
                Console.WriteLine($"{str}, {filePath}");
                return str;
            }

    使用C#代码保存文件为utf8 without bom

      filename = "2019-04-23-001.txt";
                filePath = Path.Combine(folder, filename);
                using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), new UTF8Encoding(false)))
                {
                    sw.WriteLine("hello");
                }
    
      filename = "2019-04-23-002.txt";
                filePath = Path.Combine(folder, filename);
                using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), new UTF8Encoding(false)))
                {
                    sw.WriteLine("你好");
                }

    2019-04-23-001.txt: ASCII text, with CRLF line terminators
    2019-04-23-002.txt: UTF-8 Unicode text, with CRLF line terminators

    C#在保存的时候,如果没有特殊字符,会自动保存utf8 without bom保存为ascii.

    filename = "2019-04-23-003.txt";
                filePath = Path.Combine(folder, filename);
                using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), Encoding.ASCII))
                {
                    sw.WriteLine("hello");
                }
     filename = "2019-04-23-004.txt";
                filePath = Path.Combine(folder, filename);
                using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), Encoding.ASCII))
                {
                    sw.WriteLine("你好");
                }

    2019-04-23-003.txt: ASCII text, with CRLF line terminators
    2019-04-23-004.txt: ASCII text, with CRLF line terminators

    使用系统自带的notepad,新建文件并保存为ANSI

    第一个文本文件中的内容,包含中文“你好”

    2019-04-23-011.txt: ISO-8859 text, with no line terminators

    第二个文本文件中的内容,包含英文“hello”
    2019-04-23-012.txt: ASCII text, with no line terminators

    扩展阅读

    Character Encoding in .NET

  • 相关阅读:
    项目实战
    bootscript/javascript组件
    html5应用程序标签
    bootstrap框架应用
    bootstrap javascript插件部分的笔记整理
    bootstrap页面模板
    redis安装
    nginx + vsftpd 搭建 图片服务器
    centOs7 安装
    单链表的最装逼写法
  • 原文地址:https://www.cnblogs.com/chucklu/p/6874820.html
Copyright © 2011-2022 走看看