- 总结
- Mysql 测试
- Oracle 测试
- 为什么Mysql utf8mb4 varchar(最大16383)
- Latin1 字符集的存储
总结:
1.Mysql 中的char(n) n表示字符串长度。表示最多存储n个字符,无论字符是中文英文还是数字。如char(6) 可以存储 '你好数你最棒','123456','abcdef',但不能存储'1234567'
2.Oracle 中的char(n) n表示字节数。如ZHS16GBK下,一个汉字占用两个字节,一个字符占用1个字节。char(6) 存储 '你好数你'就存不下了,存储英文或数字没问题
varchar 可采用相同的理解方式
1. Mysql 测试
char(n) n表示字符串长度
环境
Mysql 8.0.16
Charset utf8mb4
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.16 |
+-----------+
1 row in set (0.00 sec)
mysql> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+----------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`name` char(6) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> insert into t1 values('你好数你最棒');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t1 values('你好数你最棒a');
ERROR 1406 (22001): Data too long for column 'name' at row 1
mysql> insert into t1 values('123456');
Query OK, 1 row affected (0.02 sec)
mysql> insert into t1 values('1234567');
ERROR 1406 (22001): Data too long for column 'name' at row 1
mysql> insert into t1 values('abcdef');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t1 values('abcdefg');
ERROR 1406 (22001): Data too long for column 'name' at row 1
mysql> select * from t1;
+--------------------+
| name |
+--------------------+
| 你好数你最棒 |
| 123456 |
| abcdef |
+--------------------+
3 rows in set (0.00 sec)
2. Oracle 测试
char(n) n表示字节数
环境
Oracle 11.2.0.4
字符集 ZHS16GBK
一个汉字占用两个字节,一个字符占用1个字节
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
----------------------------------------------------
SIMPLIFIED CHINESE_CHINA.ZHS16GBK
SQL> create table t_oracle (name char(6));
Table created
SQL> insert into t_oracle values('中华人');
1 row inserted
SQL> insert into t_oracle values('中华人民');
insert into t_oracle values('中华人民')
ORA-12899: 列 "CYM"."T_ORACLE"."NAME" 的值太大 (实际值: 8, 最大值: 6)
SQL> insert into t_oracle values('abcdef');
1 row inserted
SQL> insert into t_oracle values('abcdefg');
insert into t_oracle values('abcdefg')
ORA-12899: 列 "CYM"."T_ORACLE"."NAME" 的值太大 (实际值: 7, 最大值: 6)
SQL> insert into t_oracle values('123456');
1 row inserted
SQL> insert into t_oracle values('1234567');
insert into t_oracle values('1234567')
ORA-12899: 列 "CYM"."T_ORACLE"."NAME" 的值太大 (实际值: 7, 最大值: 6)
SQL> commit;
Commit complete
SQL> select * from t_oracle;
NAME
------
中华人
abcdef
123456
所以这也能理解为什么 utf8mb4 下, varchar 最大长度是 16383了。
3. 为什么Mysql utf8mb4 varchar(最大16383)
mysql> create table t1_1(name varchar(65535));
ERROR 1074 (42000): Column length too big for column 'name' (max = 16383); use BLOB or TEXT instead
mysql> create table t1_1(name varchar(16383));
Query OK, 0 rows affected (0.04 sec)
mysql> select 65535/4;
+------------+
| 65535/4 |
+------------+
| 16383.7500 |
+------------+
1 row in set (0.03 sec)
因为Mysql 有硬性规定每行的最大大小是65535 Bytes。可参考Mysql8.0 参考手册
而utf8mb4字符集下每个字符的最大存储时4 Byte,要满足varchar(16384)能存储16384个字符,就需要16384*4(Bytes)=65536(Bytes)>65535(Bytes)的存储空间,这是冲突的。
所以utfmb4 下Mysql 单个字符串最长(16383),这还是在没有其他字段的情况下。官方说的是单行65535,而不是每个字段65535。如下就失败:
mysql> create table t1_2(name1 varchar(8000),name2 varchar(8384));
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
mysql> create table t1_2(name1 varchar(8000),name2 varchar(8383));
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
mysql> create table t1_2(name1 varchar(8000),name2 varchar(8382));
Query OK, 0 rows affected (0.02 sec)
第二行创建失败,是因为varchar的存储是:长度位+实际数据。n<=255 长度位=1Byte,n>255 长度位=2Bytes
所以 80004+2+83834+2=65536>65535 创建失败。
虽然utf8mb4 支持的长度不长,但是我们还是要使用utf8mb4,因为它是真正的 UTF-8 字符集,支持最多的字符,支持emoji,支持特殊的汉字,而其他字符集有遇到乱码的可能。千万注意!
4. Latin1 字符集的存储
Latin1 是一个单字节字符集,每个字符只占用1Bytes,不能存储汉字。
mysql> create table t2(name char(10))charset latin1;
Query OK, 0 rows affected (0.04 sec)
mysql> insert into t2 values('鼠');
ERROR 1366 (HY000): Incorrect string value: 'xE9xBCxA0' for column 'name' at row 1
mysql> create table t2_1(name varchar(65533)) charset latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
mysql> create table t2_1(name varchar(65532)) charset latin1;
Query OK, 0 rows affected (0.05 sec)
Latin1 65532+2=65534 Bytes,另一个字节存什么去了?另一个字节是可为空值的标志位。
mysql> create table t2_1(name varchar(65533) not null) charset latin1;
Query OK, 0 rows affected (0.04 sec)