Oracle nvarchar2存储特殊字符乱码问题
这个问题研究了一天多,终于搞定了。
起因是业务需要存特殊字符'ø'到varchar2的字段中出现乱码,因为数据库字符集是ZHS16GBK。
简单测试了下,像'ø'之类的特殊。由于国家字符集是AL16UTF16,准备用nvarchar2(nvarchar2用的是国家字符集)存储特殊字符。
但是测试环境测试结果是就算用nvarchar2存,还是有乱码的情况。
重现如下:
[oracle@zkm ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= [oracle@zkm ~]$ echo $NLS_LANG AMERICAN_AMERICA.AL32UTF8 11:22:28 SYS@zkm(451)> select userenv('language') from dual; USERENV('LANGUAGE') -------------------------------------------------------------------------------- AMERICAN_AMERICA.ZHS16GBK Elapsed: 00:00:00.01 11:22:06 SYS@zkm(451)> create table zkm ( name1 varchar2(20),name2 nvarchar2(20)); Table created. Elapsed: 00:00:01.39 11:30:12 SYS@zkm(451)> select * from NLS_DATABASE_PARAMETERS; PARAMETER VALUE -------------------------------------------------- -------------------------------------------------- NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_CHARACTERSET ZHS16GBK NLS_CALENDAR GREGORIAN NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR NLS_DUAL_CURRENCY $ NLS_COMP BINARY NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CONV_EXCP FALSE NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_RDBMS_VERSION 11.2.0.4.0 20 rows selected. Elapsed: 00:00:00.00 11:31:18 SYS@zkm(451)> insert into zkm values ('ø','ø'); 1 row created. Elapsed: 00:00:00.00 11:31:21 SYS@zkm(451)> commit; Commit complete. Elapsed: 00:00:00.00 11:31:26 SYS@zkm(451)> select * from zkm; NAME1 NAME2 ---------- ---------- ? ? (这里是中文问号) Elapsed: 00:00:00.00
比对中文问号和英文问号:
14:19:01 SYS@zkm(451)> select dump('?',1016) from dual union all select dump('?',1016) from dual union all select dump('ø',1016) from dual; DUMP('?',1016) --------------------------------------------------------------------------------------------------------------------------- Typ=96 Len=1 CharacterSet=ZHS16GBK: 3f Typ=96 Len=2 CharacterSet=ZHS16GBK: a3,bf Typ=96 Len=2 CharacterSet=ZHS16GBK: a3,bf Elapsed: 00:00:00.01
也就是说,在ZHS16GBK下,中文"?"和"ø"最后成了一样的效果,也就是"ø"乱码了。
这个很好理解,Linux OS将UTF8类型的"ø"通过sqlplus这个客户端送进去数据库中,然后数据库通过NLS_LANG环境变量了解到进来的"ø"是UTF8编码的,于是通过比对UTF8编码表和GBK编码表对应的"ø",将UTF8编码的"ø"转换成GBK的"ø"。
由于GBK不支持特殊字符"ø",在GBK编码表中不存在对应的编码,于是使用GBK编码表中的中文"?"编码替代,通过上边的dump可知为a3bf,这个就是乱码的原因。
我们可以dump表zkm的name1,由于name2字段为nvarchar类型,该类型不使用数据库字符集ZHS16GBK,而是使用国家字符集AL16UTF16,因此有如下结果:
14:38:24 SYS@zkm(87)> select dump(name1,1016) name1,dump(name2,1016) name2 from zkm; NAME1 NAME2 -------------------------------------------------- -------------------------------------------------- Typ=1 Len=2 CharacterSet=ZHS16GBK: a3,bf Typ=1 Len=2 CharacterSet=AL16UTF16: ff,1f Elapsed: 00:00:00.00
可以知道,将"ø"insert进去表的name1,name2字段,确实变成了中文的"?" ,都是'a3bf'。
不过问题是nvarchar类型用的是AL16UTF16,为啥存不了"ø"??
网上找资料才发现,要通过加N告诉这个特殊字符是Unicode字符才行。
用法参考:
http://www.orafaq.com/wiki/NVARCHAR2
http://www.orafaq.com/wiki/NCHAR
于是:
15:35:30 SYS@zkm(1398)> select dump(N'?',1016) from dual union all select dump(N'?',1016) from dual union all select dump(N'ø',1016) from dual; DUMP(N'?',1016) -------------------------------------------------------------------------------- Typ=96 Len=2 CharacterSet=AL16UTF16: 0,3f Typ=96 Len=2 CharacterSet=AL16UTF16: ff,1f Typ=96 Len=2 CharacterSet=AL16UTF16: ff,1f Elapsed: 00:00:00.00 15:35:39 SYS@zkm(1398)> delete zkm; 1 row deleted. Elapsed: 00:00:00.01 15:35:44 SYS@zkm(1398)> insert into zkm values ('ø',N'ø'); 1 row created. Elapsed: 00:00:00.00 15:36:05 SYS@zkm(1398)> commit; Commit complete. Elapsed: 00:00:00.00 15:36:44 SYS@zkm(1398)> select * from zkm; NAME1 NAME2 ---------- ---------- ? ? Elapsed: 00:00:00.00
加了N后还是发现,AL16UTF16下,中文"?"和"ø"还是一样的。
难道AL16UTF16不支持字符"ø"??不可能啊。
于是在新建了一个字符集为AL32UTF8的库,做上边同样建表的操作,然后插入数据,无论是varchar或者nvarchar都不会乱码啊,部分如下操作:
SQL> select dump(N'?',1016) from dual union all select dump(N'?',1016) from dual union all select dump(N'ø',1016) from dual; DUMP(N'?',1016) ------------------------------------------ Typ=96 Len=2 CharacterSet=AL16UTF16: 0,3f Typ=96 Len=2 CharacterSet=AL16UTF16: ff,1f Typ=96 Len=2 CharacterSet=AL16UTF16: 0,f8 SQL> select userenv('language') from dual; USERENV('LANGUAGE') ---------------------------------------------------- AMERICAN_AMERICA.AL32UTF8 SQL> select dump(N'?',1016) from dual union all select dump(N'?',1016) from dual union all select dump(N'ø',1016) from dual; DUMP(N'?',1016) ------------------------------------------ Typ=96 Len=2 CharacterSet=AL16UTF16: 0,3f Typ=96 Len=2 CharacterSet=AL16UTF16: ff,1f Typ=96 Len=2 CharacterSet=AL16UTF16: 0,f8 SQL> select unistr('