zoukankan      html  css  js  c++  java
  • code point of € ,and é

    https://www.compart.com/en/unicode/U+20AC

    Name: Euro Sign[1]
    Unicode Version: 2.1 (May 1998)[2]
    Block: Currency Symbols, U+20A0 - U+20CF[3]
    Plane: Basic Multilingual Plane, U+0000 - U+FFFF[3]
    Script: Code for undetermined script (Zyyy) [4]
    Category: Currency Symbol (Sc) [1]
    Bidirectional Class: European Terminator (ET) [1]
    Combining Class: Not Reordered (0) [1]
    Character is Mirrored: No [1]
    GCGID: SC200000[5]
    HTML Entity:

    €
    €
    €

    UTF-8 Encoding: 0xE2 0x82 0xAC
    UTF-16 Encoding: 0x20AC
    UTF-32 Encoding: 0x000020AC

    https://www.utf8-chartable.de/unicode-utf8-table.pl

    U+20AC € e2 82 ac EURO SIGN

    UTF-8的encoding

    Since the restriction of the Unicode code-space to 21-bit values in 2003, UTF-8 is defined to encode code points in one to four bytes, depending on the number of significant bits in the numerical value of the code point. The following table shows the structure of the encoding. The x characters are replaced by the bits of the code point.

    Code point <-> UTF-8 conversion
    First code pointLast code pointByte 1Byte 2Byte 3Byte 4
    U+0000 U+007F 0xxxxxxx  
    U+0080 U+07FF 110xxxxx 10xxxxxx  
    U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx  
    U+10000 [nb 2]U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

    因为€对应的code point是,0x20AC,对应于三字节的位置。所以需要做一个转换。

    0x20AC的二进制是0010000010101100

    按照上面的进行处理,得到三个字节11100010 10000010 10101100,对应的十六进制就是0x E2 82 AC

    utf-8的字符串€转换成其他编码进行识别的话

     [Test]
            public void Test20210413001()
            {
                ////UTF-8 Encoding:    0xE2 0x82 0xAC
                //UTF - 16 Encoding: 0x20AC
                //UTF - 32 Encoding: 0x000020AC
                string str = "€";
                var array = Encoding.UTF8.GetBytes(str);
                Console.WriteLine(GetHexString(array));
    
                //can not get string, as the 0x20ac will convert to three bytes in utf-8
                var bytes = new byte[] {0x20, 0xac};
                var str2 = Encoding.UTF8.GetString(bytes);
                Console.WriteLine(str2);
    
                //936     gb2312     ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)
                var str3 = Encoding.GetEncoding(936).GetString(array);
                Console.WriteLine(str3);
    
                //1252     windows-1252     ANSI Latin 1; Western European (Windows)
                var str4 = Encoding.GetEncoding(1252).GetString(array);
                Console.WriteLine(str4);
    
                //28591     iso-8859-1     ISO 8859-1 Latin 1; Western European (ISO)
                var str5 = Encoding.GetEncoding(28591).GetString(array);
                Console.WriteLine(str5);
            }

    euro sign在windows-1252以及iso-8859-1里面对应的编码,分别是80和3F

     ////UTF-8 Encoding:    0xE2 0x82 0xAC
                //UTF - 16 Encoding: 0x20AC
                //UTF - 32 Encoding: 0x000020AC
                string str = "€";
                //1252     windows-1252     ANSI Latin 1; Western European (Windows)
                var array = Encoding.GetEncoding(1252).GetBytes(str);
                Console.WriteLine(GetHexString(array));
    
                //28591     iso-8859-1     ISO 8859-1 Latin 1; Western European (ISO)
                var array6 = Encoding.GetEncoding(28591).GetBytes(str);
                Console.WriteLine(GetHexString(array6));

    https://unicode.scarfboy.com/?s=U%2b4F60

    这个可以直接根据字符,搜索得到code point,

    https://unicode.scarfboy.com/?s=%E7%8E%A9

    然后搜索结果里面,有一个U+73A9的链接,点击之后,就可以跳转

  • 相关阅读:
    笔试:一个逻辑题
    jmeter,学这些重点就可以了
    性能测试:通过一个案例(等待锁超时)告诉你,性能到底要不要熟悉业务逻辑?
    源码解读:webdriver client的原理 (面试自动化:如果你认为知道18种定位方式就算会自动化,那就太low了)
    测试必备:jmeter测试http协议接口的各种传参方式
    Vue笔记:封装 axios 为插件使用
    Vue笔记:使用 axios 发送请求
    Tomcat笔记:Tomcat的执行流程解析
    Git笔记:Git介绍和常用命令汇总
    Spring Boot使用Shiro实现登录授权认证
  • 原文地址:https://www.cnblogs.com/chucklu/p/14654363.html
Copyright © 2011-2022 走看看