zoukankan      html  css  js  c++  java
  • sqoop学习3(数据导入乱码问题)

    sqoop将mysql数据库中数据导入hdfs或hive中后中文乱码问题解决办法

    [root@spark1 ~]# vi /etc/my.cnf 修改配置文件
    在文件内的[mysqld]和client下增加如下1行
    [mysqld]
    default-character-set=utf8
    [client]
    default-character-set=utf8
    
    然后在创建数据库和表时都指定字符集为utf8
    mysql> create database wujiadong1 character set utf8;
    mysql> create table stud_info(
        -> stud_code varchar(50) not null,
        -> stud_name varchar(50) not null,
        -> stud_gend varchar(50) not null default 'M',
        -> birthday date null,
        -> log_date date null,
        -> orig_addr varchar(50) null,
        -> lev_date date null,
        -> college_code varchar(50) null,
        -> college_name varchar(50) null,
        -> state varchar(50) null,
        -> primary key(stud_code)
        -> )character set utf8;
    
    mysql> load data local infile '/root/hive_test/stud_info.csv' into table stud_info
        -> fields terminated by ','
        -> lines terminated by '
    '
        -> ignore 1 lines;
    
    mysql> select * from stud_info; #看中文字符能否正常显示
    
    再向hdfs中导入数据
    [root@spark1 ~]# sqoop import --connect jdbc:mysql://192.168.220.144:3306/wujiadong1 --username root --table stud_info --target-dir 'hdfs://spark1:9000/user/sqoop_test1' -m 1
    [root@spark1 ~]# hadoop fs -lsr /user/sqoop_test1
    [root@spark1 ~]# hadoop fs -cat /user/sqoop_test1/part-m-00000
    
    

    image

    mysql数据导入hdfs中中文乱码问题总结

    • 修改mysql里面的my.conf文件
    • 创建数据库,指定字符集是utf8
    • 再新的数据库里面创建表,在create table语句里面指定字符集是 utf8
    • 插入中文汉字记录
    • select看到中文是正常的
    • 依次完成这些操作以后,再用sqoop导入

    导入hdfs解决中文乱码问题后,再去导入hive中就没出现乱码问题了,所以应该是一样的解决方法

    mysql中的编码查看和修改方法

    查看编码方式
    mysql> show variables like 'collation_%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | latin1_swedish_ci |
    | collation_server     | latin1_swedish_ci |
    +----------------------+-------------------+
    
    mysql> show variables like 'character_set_%'; 查看mysql数据库默认编码
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    
    修改编码方式在/etc/my.cnf这个文件中修改
    [root@spark1 ~]# vi /etc/my.cnf
    root@spark1 ~]# service mysqld restart 重启mysql
    查看是否变成utf8
    mysql> s
    --------------
    mysql  Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (x86_64) using readline 5.1
    
    Connection id:		6
    Current database:	
    Current user:		root@localhost
    SSL:			Not in use
    Current pager:		stdout
    Using outfile:		''
    Using delimiter:	;
    Server version:		5.1.73 Source distribution
    Protocol version:	10
    Connection:		Localhost via UNIX socket
    Server characterset:	utf8
    Db     characterset:	utf8
    Client characterset:	utf8
    Conn.  characterset:	utf8
    UNIX socket:		/var/lib/mysql/mysql.sock
    Uptime:			22 min 3 sec
    
    Threads: 1  Questions: 59  Slow queries: 0  Opens: 20  Flush tables: 1  Open tables: 9  Queries per second avg: 0.44
    --------------
    
    mysql> show variables like "char%";
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8                       |
    | character_set_connection | utf8                       |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | utf8                       |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)
    
    mysql> show variables like "colla%";
    +----------------------+-----------------+
    | Variable_name        | Value           |
    +----------------------+-----------------+
    | collation_connection | utf8_general_ci |
    | collation_database   | utf8_general_ci |
    | collation_server     | utf8_general_ci |
    +----------------------+-----------------+
    3 rows in set (0.00 sec)
    
    
  • 相关阅读:
    不重复随机数生成
    centos 输入密码正确进不去系统
    程序退出异常_DebugHeapDelete和std::numpunct
    iptables导致数据包过多时连接失败
    linux服务器并发与tcmalloc
    Windows server 2008 被ntlmssp安装攻击 解决
    转载 html div三列布局占满全屏(左右两列定宽或者百分比、中间自动适应,div在父div中居底)
    WIN2003使用IP安全策略只允许指定IP远程桌面连接
    如何让电脑公司Win7系统自动关闭停止响应的程序
    win7 64的系统安装。net4.0总是提示安装未成功
  • 原文地址:https://www.cnblogs.com/wujiadong2014/p/6165720.html
Copyright © 2011-2022 走看看