Mysql数据库字符集知识
1 MySQL数据库字符集介绍
简单的说,一套文字符号及其编码、比较规则的集合。MySQL数据库字符集包括字符集(CHARACTER)和校对规则(COLLATION)两个概念。其中,字符集是用来定义MySQL数据字符串的存储方式,而校对规则则是定义比较字符串的方式。前面建库的语句中CHARACTER SET latin1即为数据库字符集而COLLATE latin1_swedish_ci 为校对字符集,有关字符集详细内容参考mysql手册,第10张字符集章节。
2 MySQL数据库常见字符集介绍
使用MySQL时常用的字符集有下表四种
3 MySQL如何选择合适的字符集
a.如果处理各种各样的文字,发布到不同国家和地区,应选Unicode字符集。对mysql来说就是UTF-8(每个汉字三个字节),如果应用需处理英文,有少量汉字UTF-8更好。
b.如果只需支持中文,并且数据量很大,性能要求也很高,可选GBK(定长,每个汉字占双字节,英文也占双字节),如果需要大量运算,比较顺序等定长字符集更快,性能高。
c.处理移动互联网业务,可能需要使用utf8mb4字符集。
4 查看当前MySQL系统支持的字符集
[root@localhost ~]# mysql -uroot -p123456 -e "SHOW CHARACTER SET"
最常用的有四种:
[root@localhost ~]# mysql -uroot -p123456 -e "SHOW CHARACTER SET;"|egrep "gbk|utf8|latin1"|awk ' {print $0}'
latin1 cp1252 West European latin1_swedish_ci 1
gbk GBK Simplified Chinese gbk_chinese_ci 2
utf8 UTF-8 Unicode utf8_general_ci 3
utf8mb4 UTF-8 Unicode utf8mb4_general_ci 4
查看mysql当前的字符集设置情况
mysql> show variables like 'character_set%';
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
提示:默认情况下character_set_client,character_set_connection,character_set_results三者的字符集和系统的字符集是一致的,是同时修改的。即为:
[root@localhost ~]# cat /etc/sysconfig/i18n
LANG="zh_CN.UTF-8"
[root@localhost ~]# echo $LANG
zh_CN.UTF-8
3 Mysql数据库默认设置的字符集是什么?
a.先看一下mysql默认情况下设置的字符集
mysql> show variables like 'character_set%';
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | gb2312 |
| character_set_connection | gb2312 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| gb2312 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
不同字符集参数的含义如下
Variable_name | Value
| character_set_client | latin1 客户端字符集
| character_set_connection | latin1 连接字符集
| character_set_database | latin1数据库字符集,配置文件指定或建库建表指定
| character_set_results| latin1 返回结果字符集
| character_set_server | latin1服务器字符集,配置文件指定或建库建表指定
更改linux系统字符集变量后,查看MySQL中字符集的变化
[root@localhost ~]# echo $LANG
zh_CN.UTF-8
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
我们发现character_set_connection,character_set_client,character_set_server 三者的字符集和系统的一致也都改成utf8了。
4 执行set names latin1到底做了什么
无论linux系统的字符集是gb2312还是utf8默认情况下插入数据都是乱码的。
a.此时查看数据就是乱码
mysql> use cuizhong
Database changed
mysql> select * from student
-> ;
+----+---------------------+
| id | name|
+----+---------------------+
| 1 | zhangsan|
| 2 | lisi|
| 3 | wanger |
| 4 | xiaozhang |
| 5 | xiaowang|
| 6 | ??? |
| 7 | å°çº¢ |
| 8 | ä¸è®¤è¯† |
| 9 | æŽå›› |
+----+---------------------+
9 rows in set (0.10 sec)
b. 执行完set对应的字符集操作,就解决乱码问题了
(1)先查看一下库和表的字符集
mysql> show create database cuizhongG
*************************** 1. row ***************************
Database: cuizhong
Create Database: CREATE DATABASE `cuizhong` /*!40100 DEFAULT CHARACTER SET latin1 */
1 row in set (0.00 sec)
mysql> show create table studentG
*************************** 1. row ***************************
Table: student
Create Table: CREATE TABLE `student` (
`id` int(4) NOT NULL AUTO_INCREMENT,
`name` char(20) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
(2)我们看库和表的字符集都是latin1,所以执行set names latin1保证字符集一样就不会乱码了。
mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from student;
+----+-----------+
| id | name |
+----+-----------+
| 1 | zhangsan |
| 2 | lisi |
| 3 | wanger|
| 4 | xiaozhang |
| 5 | xiaowang |
| 6 | ??? |
| 7 | 小红 |
| 8 | 不认识|
| 9 | 李四 |
+----+-----------+
(3)执行完set字符集操作的结果改变了如下字三个字符集character_set_client,character_set_connection,character_set_results的参数。
mysql> show variables like 'character_set%';
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
5 Mysql命令参数—default-character-set=latin1在做什么?
(1)先查看一下mysql的字符集
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
(2)带—default-character-set=latin1 参数登录mysql
[root@localhost ~]# mysql -uroot -p123456 --default-character-set=latin1
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 7
Server version: 5.5.32 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql>
(3)现在再查看mysql的字符集
mysql> show variables like 'character_set%';
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
(4)带参数登录也是临时修改不带参数登录又变回去了
[root@localhost ~]# mysql -uroot -p123456 --default-character-set=latin1 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
6 确保MySQL数据库插入数据不乱码解决方案
6.1统一MySQL数据库客户端及服务端的字符集
(1)MySQL数据库的下面几个字符集(客户端和服务端)统一成一个字符集才能确保插入的中文数据库可以正常输出。当然,linux系统的字符集也要尽可能和数据库字符集统一。
(2)show variables like 'character_set%';命令输出结果如下:
Variable_name| Value
+--------------------------+--------------------------------+
①character_set_client | latin1 客户端字符集
②character_set_connection | latin1 连接字符集
③character_set_database | latin1 数据库字符集
④character_set_results| latin1 返回结果字符集
⑤character_set_server | latin1 服务器字符集,配置文件制定或建库建表指定
其中,①②④三个参数默认情况采用linux系统字符集设置,人工登录数据库执行set names latin1以及mysql指定字符集登录操作,都是改变mysql客户端的client、connection、results3个参数的字符集都为latin1,从而解决插入乱码问题,这个操作可以在my.cnf配置文件里修改mysql客户端的字符集,配置方法如下:
[client]
Default-character-set=latin1
提示:不需要重启
[root@localhost ~]# sed -n "18,22p" /etc/my.cnf
[client]
#password = your_password
port = 3306
socket = /usr/local/mysql/tmp/mysql.sock
default-character-set = latin1
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
(3)修改完客户端字符集不用set查询表数据就不会乱码了
[root@localhost ~]# mysql -uroot -p123456 -e "select * from cuizhong.student;"
+----+-----------+
| id | name |
+----+-----------+
| 1 | zhangsan |
| 2 | lisi |
| 3 | wanger|
| 4 | xiaozhang |
| 5 | xiaowang |
| 6 | ??? |
| 7 | 小红|
| 8 | 不认识 |
| 9 | 李四|
+----+-----------+
6.2 更改MySQL服务端字符集
(1) 按下面要求修改my.cnf参数
[mysqld]
Default-character-set = latin1适合5.1及以前版本
(2) 修改前查看当前字符集
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
(3) 查看修改的参数
[root@localhost ~]# sed -n "26,27p" /etc/my.cnf
[mysqld]
character-set-server = utf8
(4) 重启mysql服务(生产环境是不允许重启的)
[root@localhost ~]# /etc/init.d/mysqld restart
Shutting down MySQL.. SUCCESS!
Starting MySQL.. SUCCESS!
(5) 查看更改后的字符集
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
提示:以上在[mysqld]下设置的参数会更改下面2个参数的字符集设置。
| Variable_name| Value|
| character_set_database | utf8 |
| character_set_server | utf8 |
这个时候我们再修改系统字符集mysql数据库字符集就不改了。
[root@localhost ~]# cat /etc/sysconfig/i18n
LANG="zh_CN.GB2312"
#LANG="zh_CN.UTF-8"
[root@localhost ~]# source /etc/sysconfig/i18n
[root@localhost ~]# mysql -uroot -p123456 -e "show variables like 'character_set%';"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results| utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
6.3 统一mysql数据库客户端及服务端字符集总结
不乱码思想:建议中英文环境选择utf8 ,linux系统,客户端,服务端,库,表,程序字符集统一。
1.Linux系统字符集统一utf8
[root@localhost ~]# cat /etc/sysconfig/i18n
LANG="zh_CN.UTF-8"
提示linux客户款也要更改字符集 例如:xshell
例如:SecureCRT
2.Mysql数据库客户端
临时:
Set names latin1
永久:
更改my.cnf客户端模块的参数,可以实现set names latin1效果,并永久生效。
3.服务端
更改my.cnf参数
[mysqld]
Default-character-set = latin1适合5.1及以前版本
character-set-server = latin1适合5.5
4.库表,程序 指定字符集建库
Create database cuizhong_utf8 DEFAULT CHARACTER SET UTF8 COLLATE后面加校对规则
我们可以show一下查看支持的校对规则
mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese| big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850| DOS West European | cp850_general_ci| 1 |
| hp8 | HP West European| hp8_english_ci | 1 |
| koi8r| KOI8-R Relcom Russian | koi8r_general_ci| 1 |
| latin1 | cp1252 West European| latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish| swe7_swedish_ci | 1 |
| ascii| US ASCII| ascii_general_ci| 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci| 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci| 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr| EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u| KOI8-U Ukrainian| koi8u_general_ci| 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek| ISO 8859-7 Greek| greek_general_ci| 1 |
| cp1250 | Windows Central European| cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866| DOS Russian | cp866_general_ci| 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce| Mac Central European| macce_general_ci| 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852| DOS Central European| cp852_general_ci| 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic| cp1251_general_ci | 1 |
| utf16| UTF-16 Unicode | utf16_general_ci| 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32| UTF-32 Unicode | utf32_general_ci| 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian| geostd8_general_ci | 1 |
| cp932| SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
+----------+-----------------------------+---------------------+--------+
39 rows in set (0.00 sec)
5.开发程序的字符集
简体UTF8
http://download.comsenz.com/Discuzx/3.2/Discuz_X3.2_SC_UTF8.zip。
7 如何更改生产MySQL数据库库表的字符集
数据字符集的修改步骤
对于已有数据库想修改字符集不能直接通过“alter database character set ”或者”alter table tablename character set ”,这两个命令都没有更新已有数据的字符集。而只是对新创建的表或者数据生效。
已经有记录的字符集的调整必须将数据导出,经过修改字符集之后重新导入才可完成。
修改数据库默认编码
Alter database [your db name] charset [your character setting]
下面模拟将latin1字符集的数据库修改成GBK字符集的过程。
(1)导出表结构
Mysqldump –uroot –p123456 –-default-character-set=latin1 –d dbname>alltable.sql –-default-character-set=gbk 表示以GBK字符集进行连接 –d只导表结构。
(2)然后编辑alltable.sql将latin1用sed替换成GBK
(3)确保数据不在更新导出所有数据
Mysqldmup –uroot –p123456 –-quick –-no-create-info –-extended-insert –-default-character-set=latin1 dbname>alltable.sql
参数说明:
--quick:用于转储大的表,强制mysqldump从服务器一次一行的检索数据而不是检索所有行并输出前CACHE到内存中。
--no-create-info:不创建CREATE TABLE 语句。
--extended-insert:使用包括几个VALUES列表的多行INSERT语法,这样文件更小,IO也小导入数据是非常快。
--default-character-set=latin1按照原有字符集导出数据,这样导出的文件中,所有中文都是可见的,不会保存成乱码。
(4)打开alltable.sql将set names latin1修改成set names gbk(或者修改系统的服务端和客户端)
(5)建库
Create database dbname default charset gbk;
(6)创建表执行,alltable.sql
Mysql –uroot –p123456 dbname<alltable.sql
(7)导入数据
Mysql –uroot –p123456 dbname<alltable.sql
总结:latin1改成utf8
(1)建库及建表的语句导出,sed批量修改为utf8。
(2)导出所有数据。
(3)修改mysql服务端和客户端编码为utf8。
(4)删除原有的库表及数据。
(5)导入新的建库建表的语句。
(6)导入mysql的所有数据。