zoukankan      html  css  js  c++  java
  • 使用UTF8字符集存储中文生僻字

    使用UTF8字符集存储中文生僻字

    一、相关学习BLOG

    https://www.cnblogs.com/jyzhao/p/8654412.html
    http://blog.itpub.net/781883/viewspace-1411259/
    https://www.qqxiuzi.cn/bianma/zifuji.php
    https://blog.csdn.net/iteye_7853/article/details/82516888

    二、需求详情:
    客户提出,关于氮卓斯汀变更为氮䓬斯汀,系统出现乱码问题
    产生问题的原因为:oracle数据库字符集为:ZHS16GBK,对于部分生僻字是无法正常保存的。

    三、客户提出的解决方案:
    1. 修改数据库字符集为:UTF-8。此方法需对oracle字符集进行修改,但修改后,可能会将原有数据全部变成乱码。
    2. 程序改造:将所有会涉及到生僻字的字段(例如产品名称、通用名等),存入数据库时,转码为16进制存,然后读取时再进行解码后展示到页面。此方法涉及修改代码庞大,且数据库内容可读性很差,手动刷数据、导出数据难度也很大。

    四、解决思路:
    1)直接修改数据库字符集,除非是子集修改为超集,否则不建议修改,从上述链接blog可以发现强行将db字符集从gbk修改为utf8后,plsql登录提示存在字符不匹配现象;
    2)应用程序修改,代码量大,且可读写性太差;
    3)建议将生僻字业务表,迁移至utf8 db库中存储(与开发人员沟通,实际存储生僻字的表只有20余个,可以单独对这些表进行迁移,业务修改查询表的代码(通过db_link),或者直接连接新的db,再或者通过创建db_link+同义词指向迁移后的远程表进行查询不修改应用代码(应用不修改,无感知);

    五、实验测试
    1.测试环境导出业务表
    2.导入到UTF8环境下,进行读写测试


    5.1源环境导出

    修改字符集报错
    SQL> alter database character set al32utf8;
    alter database character set al32utf8
    *1 行出现错误:
    ORA-12712: 新字符集必须为旧字符集的超集
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%'
    PARAMETER VALUE
    ------------------------------ ------------------------------
    NLS_CHARACTERSET ZHS16GBK
    NLS_NCHAR_CHARACTERSET AL16UTF16
    
    SQL> conn scott/tiger
    SQL> create table test(id int,c_name varchar2(200));
    表已创建。
    SQL> insert into test values(1,'板蓝根');
    SQL> insert into test values(2,'氮䓬斯汀');
    SQL> commit;
    SQL> insert into test values(3,'氮卓斯汀');
    SQL> commit;
    
    SQL> select * from test
    ID C_NAME
    ---------- --------------------
    1 板蓝根
    2 氮?斯汀
    3 氮卓斯汀
    
    C:UsersThinkpad>exp scott/tiger FILE=C:UsersThinkpadDesktop	emphr_test.dmp TABLES=test
    Export: Release 11.2.0.4.0 - Production on 星期三 6月 26 13:20:58 2019
    Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
    连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集
    即将导出指定的表通过常规路径...
    . . 正在导出表 TEST导出了 3 行
    成功终止导出, 没有出现警告。

     

    5.2目标环境导入

    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%';
    PARAMETER VALUE
    ------------------------------ ------------------------------
    NLS_CHARACTERSET AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    
    $env|grep LANG
    NLS_LANG=american_america.ZHS16GBK
    LANG=en_US.UTF-8
    
    enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=y
    Import: Release 11.2.0.4.0 - Production on Wed Jun 26 01:27:22 2019
    Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
    Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    Export file created by EXPORT:V11.02.00 via conventional path
    import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
    import server uses AL32UTF8 character set (possible charset conversion)
    . importing SCOTT's objects into SCOTT
    . importing SCOTT's objects into SCOTT
    . . importing table "TEST" 3 rows imported
    Import terminated successfully without warnings.
    
    SQL> select * from test;
    ID C_NAME
    ---------- ------------------------------
    1 
    2 
    3 ˹͡
    以上Oracle进行字符转换后,中文字符直接配置为Null
    
    修改语言格式,让Oracle无需进行字符转换
    export NLS_LANG=american_america.AL32UTF8
    
    enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=y
    Export file created by EXPORT:V11.02.00 via conventional path
    import done in AL32UTF8 character set and AL16UTF16 NCHAR character set
    export client uses ZHS16GBK character set (possible charset conversion)
    . importing SCOTT's objects into SCOTT
    . importing SCOTT's objects into SCOTT
    . . importing table "TEST" 3 rows imported
    Import terminated successfully without warnings.
    enmo:/home/oraclesqlplus / as sysdba
    SQL*Plus: Release 11.2.0.4.0 Production on Wed Jun 26 02:40:32 2019
    Copyright (c) 1982, 2013, Oracle. All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options

    数据验证

    SQL> conn scott/tiger
    Connected.
    SQL> select * from test;
    ID C_NAME
    ------------------------------------
    1 板蓝根
    2 氮?斯汀
    3 氮卓斯汀
    本次数据是有了,
    
    UTF8字符集
    SQL> select dump('氮卓斯汀') from dual;
    DUMP('氮卓斯汀')
    --------------------------------------------------------------
    Typ=96 Len=12: 230,176,174,229,141,147,230,150,175,230,177,128
    
    GBK字符集
    SQL> select dump('氮卓斯汀') from dual;
    DUMP('氮卓斯汀')
    ---------------------------------------------
    Typ=96 Len=8: 181,170,215,191,203,185,205,161
    
    SQL> desc scott.test
    名称 是否为空? 类型
    ----------------------------------------- -------- ----------------------------
    ID NUMBER(38)
    C_NAME VARCHAR2(200)
    
    对于两套环境test表字段进行收缩,可以发现UTF8字符集表,实际存储是使用三个字节存储一个汉字
    UTF8
    SQL> alter table scott.test modify c_name varchar2(8);
    alter table scott.test modify c_name varchar2(8)
    *
    ERROR at line 1:
    ORA-01441: cannot decrease column length because some value is too big 
    SQL> alter table scott.test modify c_name varchar2(12);
    Table altered.
    
    GBK
    GBK存储中文两个字节存储一个汉字
    SQL> alter table scott.test modify c_name varchar2(8);
    表已更改。
  • 相关阅读:
    CodeForces 710CMagic Odd Square(经典-奇数个奇数&偶数个偶数)
    CodeForces 710A King Moves(水题-越界问题)
    CodeForces 701C They Are Everywhere (滑动窗口)
    CodeForces 701B Cells Not Under Attack
    [补档]happiness
    [补档]王者之剑
    [补档]士兵占领
    [补档]搭配飞行员
    [补档]暑假集训D6总结
    [补档][Lydsy2017年4月月赛]抵制克苏恩
  • 原文地址:https://www.cnblogs.com/lvcha001/p/11089849.html
Copyright © 2011-2022 走看看