zoukankan      html  css  js  c++  java
  • 【mysql】添加对emoji的支持

    1、简介

    涉及无线相关的 MySQL 数据库建议都提前采用 utf8mb4 字符集,避免 emoji 表情符号带来的问题

    MySQL Server >  5.5.3

    2、配置+升级

    当前配置

    mysql> s
    --------------
    mysql  Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (i386) using readline 5.1
    
    Connection id:          3
    Current database:
    Current user:           root@localhost
    SSL:                    Not in use
    Current pager:          stdout
    Using outfile:          ''
    Using delimiter:        ;
    Server version:         5.6.25-debug-log Source distribution
    Protocol version:       10
    Connection:             Localhost via UNIX socket
    Server characterset:    utf8mb4
    Db     characterset:    utf8mb4
    Client characterset:    utf8mb4
    Conn.  characterset:    utf8mb4
    UNIX socket:            /data/mysql/mysql.sock
    Uptime:                 46 min 53 sec

    更改数据库/表

    ALTER DATABASE dbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
    ALTER TABLE tbname CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    ALTER TABLE tbname CHANGE old_column_name new_column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

    my.cnf配置

    [client]  
    default-character-set = utf8mb4  
     
    [mysql]  
    default-character-set = utf8mb4  
     
    [mysqld]  
    character-set-client-handshake = FALSE  
    character-set-server = utf8mb4  
    collation-server = utf8mb4_general_ci  
    init_connect='SET NAMES utf8mb4'
    

     /usr/share/mysql/charsets/Index.xml配置

    <charset name="utf8mb4">
      <family>Unicode</family>
      <description>UTF-8 Unicode</description>
      <alias>utf-8</alias>
      <collation name="utf8_general_ci"     id="33">
       <flag>primary</flag>
       <flag>compiled</flag>
      </collation>
      <collation name="utf8_bin"            id="83">
        <flag>binary</flag>
        <flag>compiled</flag>
      </collation>
    </charset>

    查看字符集

    mysql> show variables where variable_name like 'character_set%' or variable_name like 'coll%';
    +--------------------------+----------------------------------+
    | Variable_name            | Value                            |
    +--------------------------+----------------------------------+
    | character_set_client     | utf8                             |
    | character_set_connection | utf8                             |
    | character_set_database   | utf8                             |
    | character_set_filesystem | binary                           |
    | character_set_results    | utf8                             |
    | character_set_server     | utf8                             |
    | character_set_system     | utf8                             |
    | character_sets_dir       | /u01/mysql/share/mysql/charsets/ |
    | collation_connection     | utf8_general_ci                  |
    | collation_database       | utf8_general_ci                  |
    | collation_server         | utf8_general_ci                  |
    +--------------------------+----------------------------------+
    11 rows in set (0.00 sec)
    	
    mysql> show char set;
    +----------+-----------------------------+---------------------+--------+
    | Charset  | Description                 | Default collation   | Maxlen |
    +----------+-----------------------------+---------------------+--------+
    | big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
    | dec8     | DEC West European           | dec8_swedish_ci     |      1 |
    | cp850    | DOS West European           | cp850_general_ci    |      1 |
    | hp8      | HP West European            | hp8_english_ci      |      1 |
    | koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
    | latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
    | latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
    | swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
    | ascii    | US ASCII                    | ascii_general_ci    |      1 |
    | ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
    | sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
    | hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
    | tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
    | euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
    | koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
    | gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
    | greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
    | eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
    +----------+-----------------------------+---------------------+--------+
    40 rows in set (0.00 sec)
    
    mysql> show COLLATION;
    +--------------------------+----------+-----+---------+----------+---------+
    | Collation                | Charset  | Id  | Default | Compiled | Sortlen |
    +--------------------------+----------+-----+---------+----------+---------+
    | big5_chinese_ci          | big5     |   1 | Yes     | Yes      |       1 |
    | big5_bin                 | big5     |  84 |         | Yes      |       1 |
    | dec8_swedish_ci          | dec8     |   3 | Yes     | Yes      |       1 |
    | dec8_bin                 | dec8     |  69 |         | Yes      |       1 |
    | cp850_general_ci         | cp850    |   4 | Yes     | Yes      |       1 |
    | eucjpms_bin              | eucjpms  |  98 |         | Yes      |       1 |
    +--------------------------+----------+-----+---------+----------+---------+
    219 rows in set (0.01 sec)

    – character_set_server:默认的内部操作字符集

    – character_set_client:客户端来源数据使用的字符集

    – character_set_connection:连接层字符集

    – character_set_results:查询结果字符集

    – character_set_database:当前选中数据库的默认字符集

    – character_set_system:系统元数据(字段名等)字符集

    MySQL中的字符序名称遵从命名惯例

    以字符序对应的字符集名称开头;以_ci(表示大小写不敏感 case ignore )、_cs(表示大小写敏感 case sensitive )或_bin(表示按编码值比较)结尾。

    例如:在字符序“utf8_general_ci”下,字符“a”和“A”是等价的

    MySQL中的字符集转换过程

    1. MySQL Server收到请求时将请求数据从character_set_client转换为character_set_connection;

    2. 进行内部操作前将请求数据从character_set_connection转换为内部操作字符集,其确定方法如下:

    • 使用每个数据字段的CHARACTER SET设定值;

    • 若上述值不存在,则使用对应数据表的DEFAULT CHARACTER SET设定值(MySQL扩展,非SQL标准);

    • 若上述值不存在,则使用对应数据库的DEFAULT CHARACTER SET设定值;

    • 若上述值不存在,则使用character_set_server设定值。

    3. 将操作结果从内部操作字符集转换为character_set_results。

    三、emoji表情与utf8mb4

    emoji表情mysql的utf8是不支持,需要修改设置为utf8mb4

    在MYSQL 5.5 之前, UTF8 编码只支持1-3个字节,只支持BMP这部分的unicode编码区, 关于BMP,http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters 这里看,基本就是0000~FFFF这一区。

    从MYSQL5.5开始,可支持4个字节UTF编码utf8mb4,一个字符最多能有4字节,所以能支持更多的字符集,utf8mb4兼容utf8,且比utf8能表示更多的字符

    CREATE TABLE `ios_emoji` (  
      `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增ID',  
      `unicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'Unicode编码',  
      `utf8` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF8编码',  
      `utf16` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF16编码',  
      `sbunicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'SBUnicode编码',  
      `filename` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '文件名',  
      `filebyte` longblob COMMENT '文件内容字节',  
      PRIMARY KEY (`id`)  
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='ios表情编码表'; 
    

      

    参考文章

    http://drupal.stackexchange.com/questions/166405/why-are-we-using-utf8mb4-general-ci-and-not-utf8mb4-unicode-ci

    http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci

    http://blog.csdn.net/leshami/article/details/42024217

    http://www.laruence.com/2008/01/05/12.html

  • 相关阅读:
    CodeForces 1332D Walk on Matrix
    CodeForces 1324F Maximum White Subtree
    CodeForces-1324E-Sleeping-Schedule
    CodeForces-1324D-Pair-of-Topics
    理解字节序 大端字节序和小端字节序
    公钥 私钥 数字证书概念理解
    简要介绍 X Window System (又称为X11 or X)
    动态规划的简洁说明
    二分法的研究
    两个变量的内容交换
  • 原文地址:https://www.cnblogs.com/chenpingzhao/p/5082964.html
Copyright © 2011-2022 走看看