zoukankan      html  css  js  c++  java
  • Mysql UTF-8mb4字符集的问题

    官方Mysql手册链接

    https://dev.mysql.com/doc/connectors/en/connector-j-reference-charsets.html

    Notes
    For Connector/J 8.0.12 and earlier: In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.
    
    For Connector/J 8.0.13 and later:
    
    When UTF-8 is used for characterEncoding in the connection string, it maps to the MySQL character set name utf8mb4.
    
    If the connection option connectionCollation is also set alongside characterEncoding and is incompatible with it, characterEncoding will be overridden with the encoding corresponding to connectionCollation.
    
    Because there is no Java-style character set name for utfmb3 that you can use with the connection option charaterEncoding, the only way to use utf8mb3 as your connection character set is to use a utf8mb3 collation (for example, utf8_general_ci) for the connection option connectionCollation, which forces a utf8mb3 character set to be used, as explained in the last bullet.
    
    Warning
    Do not issue the query SET NAMES with Connector/J, as the driver will not detect that the character set has been changed by the query, and will continue to use the character set configured when the connection was first set up.
    

    文档说的很清楚

    提示

    mysql-connector-java 版本在8.0.12之前的,包括8.0.12,服务端必须设置character_set_server=utf8mb4;如果不是的话,就算设置了characterEncoding=UTF-8,照样会被设置为MYSQL的 utf8字符集,也就是utf8mb3。

    对于8.0.13和以后的版本,如果设置了characterEncoding=UTF-8,他会映射到MYSQL的utf8mb4字符集。

    如果connectionCollation 也和characterEncoding一起设置了,但是不兼容,characterEncoding会被connectionCollation的设置覆盖掉。

    由于没有Java-Style的utfmb3对应的字符集名称可以用在connection选项charaterEncoding上,唯一的设置utf8mb3的方式就是在连接选项设置utf8mb3 collation(例如utf8_general_ci),这会强制使用utf8mb3字符集,正如上文所述。

    警告

    不要通过Connector发起SET NAMES指令,因为driver不会检测字符集是不是被查询语句改动,并且当连接第一次建立之后,会继续使用当时的字符集设置。

    结论

    对于网上的设置:

    <property name="connectionInitSqls" value="set names utf8mb4;"/>
    

    纯属扯淡。。

    jdbc:mysql://localhost:3306/dbnameuseUnicode=true&characterEncoding=utf8
    

    也是扯淡,
    characterEncoding 要设置 为UTF-8。

    MySQL Character Set Name Java-Style Character Encoding Name
    For 8.0.12 and earlier: utf8 UTF-8
    For 8.0.13 and later: utf8mb4 UTF-8

    Java-Style的字符集是UTF-8,而不是utf8

    正确解决方法

    改服务器配置吧,或者升级mysql-connector-java 到 8.0.13以后吧

    测试情况

    jdbc:mysql://localhost:3306/dbnameuseUnicode=true&characterEncoding=utf-8&connectionCollation=utf8mb4_general_ci
    

    这样写不报错,但是无法正常存储。

    另外,版本在5.1.13以后的支持自动检测服务器设置,或者指定characterEncoding=utf-8。
    但是我自己测试的结果就是 5.1.38 不写connectionCollation的情况下,指定utf-8也报错。

    https://dev.mysql.com/doc/relnotes/connector-j/5.1/en/news-5-1-13.html

    Connector/J now auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)
    

    即便写了connectionCollation,Mysql也不能正确存储。

    mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8mb4                    |
    | character_set_connection | utf8mb4                    |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | utf8mb4                    |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    | collation_connection     | utf8mb4_general_ci         |
    | collation_database       | utf8_general_ci            |
    | collation_server         | utf8_bin                   |
    +--------------------------+----------------------------+
    11 rows in set (0.04 sec)
    
    mysql> select hex(content) from send_message where id = 348;
    +------------------------+
    | hex(content)           |
    +------------------------+
    | 3C703EF09F988A3C2F703E |
    +------------------------+
    1 row in set (0.04 sec)
    
    #F09F988A//这个是emoji的hex值,通过navicat插入的。就可以。
    
    
    mysql> select hex(content) from send_message where id = 349;
    +------------------------+
    | hex(content)           |
    +------------------------+
    | 3C703E3F3F3F3F3C2F703E |
    +------------------------+
    1 row in set (0.04 sec)
    
    #3F3F3F3F//这个是通过jdbc插入的,看样子是无法正确存储了。
    
    
  • 相关阅读:
    如何快速搞定websocket
    websocket断网消息补发
    div嵌套多个点击事件,点击后如何阻止多次事件触发冒泡
    仿照 MediatR实现了一个中介者模式Publish功能,使用同MediatR
    git提交指南(超级详细)
    删除github中的文件夹
    抽取进程集成模式注册报错,OGG-08221,OCI Error ORA-44004 invalid qualified SQL Name
    Oracle-参数学习_no_or_expansion
    OGG19版本源端新增字段,目标端复制进程不报错,使用MAPALLCOLUMNS进行测试
    Oracle存储过程如何定位慢SQL?
  • 原文地址:https://www.cnblogs.com/slankka/p/10116258.html
Copyright © 2011-2022 走看看