zoukankan      html  css  js  c++  java
  • Mysql运维管理-Mysql数据库字符集11

    Mysql数据库字符集知识

    1 MySQL数据库字符集介绍

    简单的说,一套文字符号及其编码、比较规则的集合。MySQL数据库字符集包括字符集(CHARACTER)和校对规则(COLLATION)两个概念。其中,字符集是用来定义MySQL数据字符串的存储方式,而校对规则则是定义比较字符串的方式。前面建库的语句中CHARACTER SET latin1即为数据库字符集而COLLATE latin1_swedish_ci 为校对字符集,有关字符集详细内容参考mysql手册,第10张字符集章节。

    2 MySQL数据库常见字符集介绍

    使用MySQL时常用的字符集有下表四种

    Mysql DBA 高级运维学习笔记-Mysql插入中文乱码问题

    3 MySQL如何选择合适的字符集

    a.如果处理各种各样的文字,发布到不同国家和地区,应选Unicode字符集。对mysql来说就是UTF-8(每个汉字三个字节),如果应用需处理英文,有少量汉字UTF-8更好。

    b.如果只需支持中文,并且数据量很大,性能要求也很高,可选GBK(定长,每个汉字占双字节,英文也占双字节),如果需要大量运算,比较顺序等定长字符集更快,性能高。

    c.处理移动互联网业务,可能需要使用utf8mb4字符集。

    4 查看当前MySQL系统支持的字符集

    [root@localhost ~]# mysql -uroot -p123456 -e "SHOW CHARACTER SET"

    最常用的有四种:

    [root@localhost ~]# mysql -uroot -p123456 -e "SHOW CHARACTER SET;"|egrep "gbk|utf8|latin1"|awk ' {print $0}'
    latin1    cp1252 West European    latin1_swedish_ci    1
    gbk    GBK Simplified Chinese    gbk_chinese_ci    2
    utf8    UTF-8 Unicode    utf8_general_ci    3
    utf8mb4    UTF-8 Unicode    utf8mb4_general_ci    4

    查看mysql当前的字符集设置情况

    mysql> show variables like 'character_set%';
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    提示:默认情况下character_set_client,character_set_connection,character_set_results三者的字符集和系统的字符集是一致的,是同时修改的。即为:

    [root@localhost ~]# cat /etc/sysconfig/i18n 
    LANG="zh_CN.UTF-8"
    [root@localhost ~]# echo $LANG
    zh_CN.UTF-8

    3 Mysql数据库默认设置的字符集是什么?

    a.先看一下mysql默认情况下设置的字符集

    mysql> show variables like 'character_set%';
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | gb2312   |
    | character_set_connection | gb2312   |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| gb2312   |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    不同字符集参数的含义如下

    Variable_name  | Value  
    | character_set_client | latin1  客户端字符集
    | character_set_connection | latin1  连接字符集
    | character_set_database   | latin1数据库字符集,配置文件指定或建库建表指定
    | character_set_results| latin1  返回结果字符集
    | character_set_server | latin1服务器字符集,配置文件指定或建库建表指定

    更改linux系统字符集变量后,查看MySQL中字符集的变化

    [root@localhost ~]# echo $LANG
    zh_CN.UTF-8
    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    我们发现character_set_connection,character_set_client,character_set_server 三者的字符集和系统的一致也都改成utf8了。

    4 执行set names latin1到底做了什么

    无论linux系统的字符集是gb2312还是utf8默认情况下插入数据都是乱码的。

    a.此时查看数据就是乱码

    mysql> use cuizhong
    Database changed
    mysql> select * from student
    -> ;
    +----+---------------------+
    | id | name|
    +----+---------------------+
    |  1 | zhangsan|
    |  2 | lisi|
    |  3 | wanger  |
    |  4 | xiaozhang   |
    |  5 | xiaowang|
    |  6 | ??? |
    |  7 | å°çº¢  |
    |  8 | ä¸è®¤è¯†   |
    |  9 | æŽå››  |
    +----+---------------------+
    9 rows in set (0.10 sec)

    b. 执行完set对应的字符集操作,就解决乱码问题了

    (1)先查看一下库和表的字符集

    mysql> show create database cuizhongG
    *************************** 1. row ***************************
       Database: cuizhong
    Create Database: CREATE DATABASE `cuizhong` /*!40100 DEFAULT CHARACTER SET latin1 */
    1 row in set (0.00 sec)
    mysql> show create table studentG
    *************************** 1. row ***************************
       Table: student
    Create Table: CREATE TABLE `student` (
      `id` int(4) NOT NULL AUTO_INCREMENT,
      `name` char(20) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)

    (2)我们看库和表的字符集都是latin1,所以执行set names latin1保证字符集一样就不会乱码了。

    mysql> set names latin1;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select * from student;
    +----+-----------+
    | id | name  |
    +----+-----------+
    |  1 | zhangsan  |
    |  2 | lisi  |
    |  3 | wanger|
    |  4 | xiaozhang |
    |  5 | xiaowang  |
    |  6 | ???   |
    |  7 | 小红  |
    |  8 | 不认识|
    |  9 | 李四  |
    +----+-----------+

    (3)执行完set字符集操作的结果改变了如下字三个字符集character_set_client,character_set_connection,character_set_results的参数。

    mysql> show variables like 'character_set%';
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | latin1   |
    | character_set_connection | latin1   |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| latin1   |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    5 Mysql命令参数—default-character-set=latin1在做什么?

    (1)先查看一下mysql的字符集

    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    (2)带—default-character-set=latin1 参数登录mysql

    [root@localhost ~]# mysql -uroot -p123456 --default-character-set=latin1
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 7
    Server version: 5.5.32 Source distribution
    
    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> 

    (3)现在再查看mysql的字符集

    mysql> show variables like 'character_set%';
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | latin1   |
    | character_set_connection | latin1   |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| latin1   |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    (4)带参数登录也是临时修改不带参数登录又变回去了

    [root@localhost ~]# mysql -uroot -p123456 --default-character-set=latin1 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | latin1   |
    | character_set_connection | latin1   |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| latin1   |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+
    
    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    6 确保MySQL数据库插入数据不乱码解决方案

    6.1统一MySQL数据库客户端及服务端的字符集

    (1)MySQL数据库的下面几个字符集(客户端和服务端)统一成一个字符集才能确保插入的中文数据库可以正常输出。当然,linux系统的字符集也要尽可能和数据库字符集统一。

    (2)show variables like 'character_set%';命令输出结果如下:

    Variable_name| Value 
    +--------------------------+--------------------------------+
    ①character_set_client | latin1  客户端字符集
    ②character_set_connection | latin1  连接字符集
    ③character_set_database   | latin1   数据库字符集
    ④character_set_results| latin1   返回结果字符集
    ⑤character_set_server | latin1   服务器字符集,配置文件制定或建库建表指定

    其中,①②④三个参数默认情况采用linux系统字符集设置,人工登录数据库执行set names latin1以及mysql指定字符集登录操作,都是改变mysql客户端的client、connection、results3个参数的字符集都为latin1,从而解决插入乱码问题,这个操作可以在my.cnf配置文件里修改mysql客户端的字符集,配置方法如下:

    [client]
    Default-character-set=latin1
    提示:不需要重启
    [root@localhost ~]# sed -n "18,22p" /etc/my.cnf 
    [client]
    #password    = your_password
    port        = 3306
    socket        = /usr/local/mysql/tmp/mysql.sock
    default-character-set = latin1
    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | latin1   |
    | character_set_connection | latin1   |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| latin1   |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    (3)修改完客户端字符集不用set查询表数据就不会乱码了

    [root@localhost ~]# mysql -uroot -p123456 -e "select * from cuizhong.student;"
    +----+-----------+
    | id | name  |
    +----+-----------+
    |  1 | zhangsan  |
    |  2 | lisi  |
    |  3 | wanger|
    |  4 | xiaozhang |
    |  5 | xiaowang  |
    |  6 | ???   |
    |  7 | 小红|
    |  8 | 不认识 |
    |  9 | 李四|
    +----+-----------+

    6.2 更改MySQL服务端字符集

    (1) 按下面要求修改my.cnf参数

    [mysqld]
    Default-character-set = latin1适合5.1及以前版本

    (2) 修改前查看当前字符集

    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | latin1   |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | latin1   |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    (3) 查看修改的参数

    [root@localhost ~]# sed -n "26,27p" /etc/my.cnf 
    [mysqld]
    character-set-server = utf8

    (4) 重启mysql服务(生产环境是不允许重启的)

    [root@localhost ~]# /etc/init.d/mysqld restart
    Shutting down MySQL.. SUCCESS! 
    Starting MySQL.. SUCCESS!

    (5) 查看更改后的字符集

    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | utf8 |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    提示:以上在[mysqld]下设置的参数会更改下面2个参数的字符集设置。

    | Variable_name| Value|
    | character_set_database   | utf8 |
    | character_set_server | utf8 |

    这个时候我们再修改系统字符集mysql数据库字符集就不改了。

    [root@localhost ~]# cat /etc/sysconfig/i18n 
    LANG="zh_CN.GB2312"
    #LANG="zh_CN.UTF-8"
    [root@localhost ~]# source /etc/sysconfig/i18n 
    [root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
    +--------------------------+----------------------------------+
    | Variable_name| Value|
    +--------------------------+----------------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database   | utf8 |
    | character_set_filesystem | binary   |
    | character_set_results| utf8 |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    | character_sets_dir   | /usr/local/mysql/share/charsets/ |
    +--------------------------+----------------------------------+

    6.3 统一mysql数据库客户端及服务端字符集总结

    不乱码思想:建议中英文环境选择utf8 ,linux系统,客户端,服务端,库,表,程序字符集统一。

    1.Linux系统字符集统一utf8

    [root@localhost ~]# cat /etc/sysconfig/i18n 
    LANG="zh_CN.UTF-8"

    提示linux客户款也要更改字符集 例如:xshell

    Mysql DBA 高级运维学习笔记-Mysql插入中文乱码问题

    例如:SecureCRT

    Mysql DBA 高级运维学习笔记-Mysql插入中文乱码问题

    2.Mysql数据库客户端

    临时:

    Set names latin1

    永久:

    更改my.cnf客户端模块的参数,可以实现set names latin1效果,并永久生效。

    3.服务端

    更改my.cnf参数

    [mysqld]
    Default-character-set = latin1适合5.1及以前版本
    character-set-server = latin1适合5.5

    4.库表,程序 指定字符集建库

    Create database cuizhong_utf8 DEFAULT CHARACTER SET UTF8 COLLATE后面加校对规则

    我们可以show一下查看支持的校对规则

    mysql> show character set;
    +----------+-----------------------------+---------------------+--------+
    | Charset  | Description | Default collation   | Maxlen |
    +----------+-----------------------------+---------------------+--------+
    | big5 | Big5 Traditional Chinese| big5_chinese_ci |  2 |
    | dec8 | DEC West European   | dec8_swedish_ci |  1 |
    | cp850| DOS West European   | cp850_general_ci|  1 |
    | hp8  | HP West European| hp8_english_ci  |  1 |
    | koi8r| KOI8-R Relcom Russian   | koi8r_general_ci|  1 |
    | latin1   | cp1252 West European| latin1_swedish_ci   |  1 |
    | latin2   | ISO 8859-2 Central European | latin2_general_ci   |  1 |
    | swe7 | 7bit Swedish| swe7_swedish_ci |  1 |
    | ascii| US ASCII| ascii_general_ci|  1 |
    | ujis | EUC-JP Japanese | ujis_japanese_ci|  3 |
    | sjis | Shift-JIS Japanese  | sjis_japanese_ci|  2 |
    | hebrew   | ISO 8859-8 Hebrew   | hebrew_general_ci   |  1 |
    | tis620   | TIS620 Thai | tis620_thai_ci  |  1 |
    | euckr| EUC-KR Korean   | euckr_korean_ci |  2 |
    | koi8u| KOI8-U Ukrainian| koi8u_general_ci|  1 |
    | gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |  2 |
    | greek| ISO 8859-7 Greek| greek_general_ci|  1 |
    | cp1250   | Windows Central European| cp1250_general_ci   |  1 |
    | gbk  | GBK Simplified Chinese  | gbk_chinese_ci  |  2 |
    | latin5   | ISO 8859-9 Turkish  | latin5_turkish_ci   |  1 |
    | armscii8 | ARMSCII-8 Armenian  | armscii8_general_ci |  1 |
    | utf8 | UTF-8 Unicode   | utf8_general_ci |  3 |
    | ucs2 | UCS-2 Unicode   | ucs2_general_ci |  2 |
    | cp866| DOS Russian | cp866_general_ci|  1 |
    | keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |  1 |
    | macce| Mac Central European| macce_general_ci|  1 |
    | macroman | Mac West European   | macroman_general_ci |  1 |
    | cp852| DOS Central European| cp852_general_ci|  1 |
    | latin7   | ISO 8859-13 Baltic  | latin7_general_ci   |  1 |
    | utf8mb4  | UTF-8 Unicode   | utf8mb4_general_ci  |  4 |
    | cp1251   | Windows Cyrillic| cp1251_general_ci   |  1 |
    | utf16| UTF-16 Unicode  | utf16_general_ci|  4 |
    | cp1256   | Windows Arabic  | cp1256_general_ci   |  1 |
    | cp1257   | Windows Baltic  | cp1257_general_ci   |  1 |
    | utf32| UTF-32 Unicode  | utf32_general_ci|  4 |
    | binary   | Binary pseudo charset   | binary  |  1 |
    | geostd8  | GEOSTD8 Georgian| geostd8_general_ci  |  1 |
    | cp932| SJIS for Windows Japanese   | cp932_japanese_ci   |  2 |
    | eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |  3 |
    +----------+-----------------------------+---------------------+--------+
    39 rows in set (0.00 sec)

    5.开发程序的字符集

    简体UTF8

    http://download.comsenz.com/Discuzx/3.2/Discuz_X3.2_SC_UTF8.zip

    7 如何更改生产MySQL数据库库表的字符集

    数据字符集的修改步骤

    对于已有数据库想修改字符集不能直接通过“alter database character set ”或者”alter table tablename character set ”,这两个命令都没有更新已有数据的字符集。而只是对新创建的表或者数据生效。

    已经有记录的字符集的调整必须将数据导出,经过修改字符集之后重新导入才可完成。

    修改数据库默认编码

    Alter database [your db name] charset [your character setting]

    下面模拟将latin1字符集的数据库修改成GBK字符集的过程。

    (1)导出表结构

    Mysqldump –uroot –p123456 –-default-character-set=latin1 –d dbname>alltable.sql –-default-character-set=gbk 表示以GBK字符集进行连接 –d只导表结构。

    (2)然后编辑alltable.sql将latin1用sed替换成GBK

    (3)确保数据不在更新导出所有数据

    Mysqldmup –uroot –p123456 –-quick –-no-create-info –-extended-insert –-default-character-set=latin1 dbname>alltable.sql

    参数说明:

    --quick:用于转储大的表,强制mysqldump从服务器一次一行的检索数据而不是检索所有行并输出前CACHE到内存中。

    --no-create-info:不创建CREATE TABLE 语句。

    --extended-insert:使用包括几个VALUES列表的多行INSERT语法,这样文件更小,IO也小导入数据是非常快。

    --default-character-set=latin1按照原有字符集导出数据,这样导出的文件中,所有中文都是可见的,不会保存成乱码。

    (4)打开alltable.sql将set names latin1修改成set names gbk(或者修改系统的服务端和客户端)

    (5)建库

    Create database dbname default charset gbk;

    (6)创建表执行,alltable.sql

    Mysql –uroot –p123456 dbname<alltable.sql

    (7)导入数据

    Mysql –uroot –p123456 dbname<alltable.sql

    总结:latin1改成utf8

    (1)建库及建表的语句导出,sed批量修改为utf8。

    (2)导出所有数据。

    (3)修改mysql服务端和客户端编码为utf8。

    (4)删除原有的库表及数据。

    (5)导入新的建库建表的语句。

    (6)导入mysql的所有数据。

  • 相关阅读:
    soa
    最短路径分词
    Collector
    solr params.json
    oracle第一章
    记一次web项目总结
    java.util 类 TreeSet<E>
    自定义jstl标签库
    java二维数组简单初步理解
    Java中Array的常用方法
  • 原文地址:https://www.cnblogs.com/zywu-king/p/8563156.html
Copyright © 2011-2022 走看看