zoukankan      html  css  js  c++  java
  • Differences between ANSI, ISO-8859-1 and MacRoman character sets

    Differences between ANSI, ISO-8859-1 and MacRoman character sets

    Of the three main 8-bit character sets, only ISO-8859-1 is produced by a standards organization. The three sets are identical for the 95 characters from 32 to 126, the ASCII character set. The ANSI character set, also known as Windows-1252, has become a Microsoft proprietary character set; it is a superset of ISO-8859-1 with the addition of 27 characters in locations that ISO designates for control codes. Apple’s proprietary MacRoman character set contains a similar variety of characters from 128 to 255, but with very few of them assigned the same numbers, and also assigns characters to the control-code positions.

    The characters that appear in the first column of the following tables are generated from Unicode numeric character references, and so they should appear correctly in any Web browser that supports Unicode and that has suitable fonts available, regardless of the operating system.

    1. ANSI characters not present in ISO-8859-1
    2. ANSI characters not present in MacRoman
    3. ISO-8859-1 characters not present in ANSI
    4. ISO-8859-1 characters not present in MacRoman
    5. MacRoman characters not present in ANSI
    6. MacRoman characters not present in ISO-8859-1

    这里是ANSI是指,Windows-1252,对应的code page是

    1252 windows-1252 ANSI Latin 1; Western European (Windows)

    This character encoding is a superset of ISO 8859-1 in terms of printable characters, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range. Notable additional characters include curly quotation marks and all the printable characters that are in ISO 8859-15 (at different places than ISO 8859-15). It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252". 

    It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes引号 and apostrophes撇号 (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the media type charset ISO-8859-1 as Windows-1252 to accommodate such mislabeling. This is now standard behavior in the HTML5 specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[5]

    Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be ANSI standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer用词不当 that continues to persist in the Windows community.

    help chcp
    Displays or sets the active code page number.

    CHCP [nnn]

    nnn Specifies a code page number.

    Type CHCP without a parameter to display the active code page number.

    ANSI本来包含€,但是本地的code page是936,复制过去,不能在notepad++里面正常显示。所以notepad++里面的ANSI是指当前的code page。

    Finding out the default character encoding in Windows

    You can check with PowerShell:

    [System.Text.Encoding]::Default
    

    which even enables you to check that across several machines at once.

    但是系统默认的编码是utf-8,对应的code page是65001

    Preamble :
    BodyName : utf-8
    EncodingName : Unicode (UTF-8)
    HeaderName : utf-8
    WebName : utf-8
    WindowsCodePage : 1200
    IsBrowserDisplay : True
    IsBrowserSave : True
    IsMailNewsDisplay : True
    IsMailNewsSave : True
    IsSingleByte : False
    EncoderFallback : System.Text.EncoderReplacementFallback
    DecoderFallback : System.Text.DecoderReplacementFallback
    IsReadOnly : True
    CodePage : 65001

     65001 utf-8 Unicode (UTF-8)

    65001 utf-8 Unicode (UTF-8)
  • 相关阅读:
    CentOS7配置本地yum源和在线yum源
    Centos7中安装samba服务器
    phpmydmain登录问题
    java实现简单的加法器
    我的偶像 凯文 米特尼克 简介
    安全好的地方分享
    a标签
    Vmware虚拟机 的工作模式
    java面板
    java的套接字实现远程连接
  • 原文地址:https://www.cnblogs.com/chucklu/p/14654158.html
Copyright © 2011-2022 走看看