zoukankan      html  css  js  c++  java
  • 关系数据库数据与hadoop数据进行转换的工具

    Sqoop

    本文所使用的Sqoop版本为1.4.6

    1.官网

      http://sqoop.apache.org

    2.作用

      A:可以把hadoop数据导入到关系数据库里面(e.g. Hive -> Mysql)

      B:可以把关系数据库数据导入到hadoop里面(e.g. Mysql -> Hive)

    3.下载

      http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz

    4.安装

    --上传到node1(我之前安装的hive就在node1上面)本目录,并且解压
    cd
    tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
    
    --创建软链
    ln -sf /root/sqoop-1.4.6.bin__hadoop-2.0.4-alpha /home/sqoop-1.4.6

    5.环境变量配置

    --配置环境变量
    vi /etc/profile
    
    export HADOOP_PREFIX=$HADOOP_HOME
    export PATH=$PATH:$SQOOP_HOME/bin
    
    :wq
    
    source /etc/profile

    6.修改配置文件

    --修改配置文件
    cd /home/sqoop-1.4.6/conf/
    
    cp sqoop-env-template.sh sqoop-env.sh
    
    vi sqoop-env.sh

    7.添加驱动包

    --把mysql驱动包添加到sqoop的lib目录下面
    cd
    scp mysql-connector-java-5.1.23-bin.jar /home/sqoop-1.4.6/lib/

    8.测试

    sqoop version
    
    [root@node1 ~]# sqoop version
    Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    19/01/04 23:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    Sqoop 1.4.6
    git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
    Compiled by root on Mon Apr 27 14:38:36 CST 2015
    
    
    
    sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
    
    [root@node1 ~]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
    Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    19/01/04 23:17:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    19/01/04 23:17:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/01/04 23:17:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    information_schema
    hive
    mysql
    performance_schema
    result_db
    spark
    sys

    你会发现,在输出里面会有很多Warning

    我们可以通过以下操作去掉这些Warning

    --去除Warning
    cd /home/sqoop-1.4.6/bin/
    
    vi configure-sqoop
    
    --把下面的行全部注释掉 - 在每一行前面加 '#'
    ## Moved to be a runtime check in sqoop.
    #if [ ! -d "${HBASE_HOME}" ]; then
    #  echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
    #  echo 'Please set $HBASE_HOME to the root of your HBase installation.'
    #fi
    
    ## Moved to be a runtime check in sqoop.
    #if [ ! -d "${HCAT_HOME}" ]; then
    #  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
    #  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
    #fi
    
    #if [ ! -d "${ACCUMULO_HOME}" ]; then
    #  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
    #  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
    #fi
    #if [ ! -d "${ZOOKEEPER_HOME}" ]; then
    #  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
    #  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
    #fi
    
    :wq
    
    --再次测试
    [root@node1 bin]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
    19/01/04 23:34:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 19/01/04 23:34:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/01/04 23:34:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema hive mysql performance_schema result_db spark sys

    9.应用

    9.1.从Mysql导入到HDFS

    准备工作:

    --在数据库里面先创建table
    CREATE TABLE t_user (id INT, name VARCHAR(20), age INT);
    
    --插入测试数据
    insert into t_user values(1, 'Tom', 20);
    insert into t_user values(2, 'John', 18);
    insert into t_user values(3, 'Div', 25);
    insert into t_user values(4, 'Susan', 31);
    insert into t_user values(5, 'Tiran', 40);
    insert into t_user values(6, 'Shasita', 13);

    查询结果:

    mysql> select * from t_user;
    +------+---------+------+
    | id   | name    | age  |
    +------+---------+------+
    |    1 | Tom     |   20 |
    |    2 | John    |   18 |
    |    3 | Div     |   25 |
    |    4 | Susan   |   31 |
    |    5 | Tiran   |   40 |
    |    6 | Shasita |   13 |
    +------+---------+------+
    6 rows in set (0.00 sec)
    --从mysql数据库里面导出数据到Hdfs上面
    sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user
    
    
    [root@node1 bin]# sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user
    19/01/04 23:54:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    19/01/04 23:54:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/01/04 23:54:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    19/01/04 23:54:30 INFO tool.CodeGenTool: Beginning code generation
    19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1
    19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1
    19/01/04 23:54:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop-2.5
    Note: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    19/01/04 23:54:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.jar
    19/01/04 23:54:33 WARN manager.MySQLManager: It looks like you are importing from mysql.
    19/01/04 23:54:33 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
    19/01/04 23:54:33 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
    19/01/04 23:54:33 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
    19/01/04 23:54:33 INFO mapreduce.ImportJobBase: Beginning import of t_user
    19/01/04 23:54:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    19/01/04 23:54:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    19/01/04 23:54:34 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.79.138:8032
    19/01/04 23:54:48 INFO db.DBInputFormat: Using read commited transaction isolation
    19/01/04 23:54:48 INFO mapreduce.JobSubmitter: number of splits:1
    19/01/04 23:54:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1546674829746_0001
    19/01/04 23:54:49 INFO impl.YarnClientImpl: Submitted application application_1546674829746_0001
    19/01/04 23:54:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1546674829746_0001/
    19/01/04 23:54:49 INFO mapreduce.Job: Running job: job_1546674829746_0001
    19/01/04 23:54:59 INFO mapreduce.Job: Job job_1546674829746_0001 running in uber mode : false
    19/01/04 23:54:59 INFO mapreduce.Job:  map 0% reduce 0%
    19/01/04 23:55:06 INFO mapreduce.Job:  map 100% reduce 0%
    19/01/04 23:55:06 INFO mapreduce.Job: Job job_1546674829746_0001 completed successfully
    19/01/04 23:55:06 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=116299
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=87
            HDFS: Number of bytes written=63
            HDFS: Number of read operations=4
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=4153
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=4153
            Total vcore-seconds taken by all map tasks=4153
            Total megabyte-seconds taken by all map tasks=4252672
        Map-Reduce Framework
            Map input records=6
            Map output records=6
            Input split bytes=87
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=69
            CPU time spent (ms)=1170
            Physical memory (bytes) snapshot=175808512
            Virtual memory (bytes) snapshot=893071360
            Total committed heap usage (bytes)=84934656
        File Input Format Counters 
            Bytes Read=0
        File Output Format Counters 
            Bytes Written=63
    19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 32.3608 seconds (1.9468 bytes/sec)
    19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Retrieved 6 records.

    运行效果:

     

    官网提供另一种方式,即读取文件的方式来实现上面的导入功能

    cd 
    mkdir mysqoopdir
    cd mysqoopdir
    
    vi mysql_to_hdfs
    
    
    import 
    --connect 
    jdbc:mysql://node1:3306/sqoop_db 
    --username 
    root 
    --password 
    '!QAZ2wsx3edc' 
    --table 
    t_user 
    --columns 
    id,name,age 
    -m 
    1 
    --target-dir 
    /sqoop_t_user
    --delete-target-dir
    
    :wq
    
    sqoop --options-file mysql_to_hdfs

    我们可以通过Hive,来验证导入结果

    [root@node1 bin]# ./hive
    19/01/05 00:03:29 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    
    Logging initialized using configuration in jar:file:/root/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
    hive> dfs -cat /sqoop_t_user/*;
    1,Tom,20
    2,John,18
    3,Div,25
    4,Susan,31
    5,Tiran,40
    6,Shasita,13

    我们看到的结果和mysql里面的数据一样。

    应用场景:如果现在我们的需要处理/分析的数据都存在Mysql数据库里面,并且数据量比较大,我们想要通过离线分析这些数据。这时,我们就可以把Mysql里面的数据通过Sqoop导入到Hdfs里面,进行分析处理。

    导入查询结果:

    --导入查询结果
    cd mysqoopdir
    
    vi mysql_query_to_hdfs
    
    
    import
    --connect
    jdbc:mysql://node1:3306/sqoop_db
    --username
    root
    --password
    '!QAZ2wsx3edc'
    -e 
    select id, name from t_user where id >= 1 and $CONDITIONS
    -m
    1
    --target-dir
    /sqoop_t_user
    --delete-target-dir
    
    :wq
    
    sqoop --options-file mysql_query_to_hdfs
    
    --检验
    hive> dfs -cat /sqoop_t_user/*;
    1,Tom
    2,John
    3,Div
    4,Susan
    5,Tiran
    6,Shasita

    9.2Mysql导入数据到Hive

    --mysql导入到Hive
    cd mysqoopdir
    
    vi mysql_to_hive
    
    
    import
    --connect
    jdbc:mysql://node1:3306/sqoop_db
    --username
    root
    --password
    '!QAZ2wsx3edc'
    --table
    t_user
    -m
    1
    --create-hive-table
    --target-dir
    /sqoop_mysql_to_hive/
    --hive-home
    /home/hive/
    --hive-import
    --hive-table
    t_sqoop_mysql_t_user_to_hive
    --create-hive-table
    
    
    :wq
    
    sqoop --options-file mysql_to_hive
    
    --检验
    hive> select * from t_sqoop_mysql_t_user_to_hive;
    OK
    1    Tom    20
    2    John    18
    3    Div    25
    4    Susan    31
    5    Tiran    40
    6    Shasita    13
    Time taken: 0.577 seconds, Fetched: 6 row(s)

    9.3.Mysql导入到Hbase

    --mysql导入到Hbase
    cd mysqoopdir
    
    vi mysql_to_hbase
    
    
    import
    --connect
    jdbc:mysql://node1:3306/sqoop_db
    --username
    root
    --password
    '!QAZ2wsx3edc'
    --table
    t_user
    -m
    1
    --hbase-table
    t_sqoop_mysql_t_user_to_hbase
    --hbase-row-key
    id
    --hbase-create-table
    --column-family
    cf
    
    
    :wq
    
    sqoop --options-file mysql_to_hbase
    
    --检验
    hbase(main):004:0> scan 't_sqoop_mysql_t_user_to_hbase'
    ROW                     COLUMN+CELL                                                        
     1                      column=cf:age, timestamp=1546680684317, value=20                   
     1                      column=cf:name, timestamp=1546680684317, value=Tom                 
     2                      column=cf:age, timestamp=1546680684317, value=18                   
     2                      column=cf:name, timestamp=1546680684317, value=John                
     3                      column=cf:age, timestamp=1546680684317, value=25                   
     3                      column=cf:name, timestamp=1546680684317, value=Div                 
     4                      column=cf:age, timestamp=1546680684317, value=31                   
     4                      column=cf:name, timestamp=1546680684317, value=Susan               
     5                      column=cf:age, timestamp=1546680684317, value=40                   
     5                      column=cf:name, timestamp=1546680684317, value=Tiran               
     6                      column=cf:age, timestamp=1546680684317, value=13                   
     6                      column=cf:name, timestamp=1546680684317, value=Shasita             
    6 row(s) in 0.1730 seconds

    9.4.Hdfs导出到Mysql

    --Hdfs导出到Mysql
    cd mysqoopdir
    
    vi hdfs_to_mysql
    
    
    export
    --connect
    jdbc:mysql://node1:3306/sqoop_db
    --username
    root
    --password
    '!QAZ2wsx3edc'
    --table
    t_hdfs_user_to_mysql
    -m
    1
    --columns
    id,name,age
    --export-dir
    /sqoop_t_user/
    
    
    :wq
    
    
    --进入node1
    mysql -u root -p
    !QAZ2wsx3edc
    show database;
    use sqoop_db;
    show tables;
    --创建t_hdfs_user_to_mysql表
    CREATE TABLE t_hdfs_user_to_mysql (id INT, name VARCHAR(20), age INT);
    
    
    cd mysqoopdir
    sqoop --options-file hdfs_to_mysql
    
    --检验
    --执行导入前
    mysql> select * from t_hdfs_user_to_mysql;
    Empty set (0.00 sec)
    
    --执行导入后
    mysql> mysql> select * from t_hdfs_user_to_mysql;
    +------+---------+------+
    | id   | name    | age  |
    +------+---------+------+
    |    1 | Tom     |   20 |
    |    2 | John    |   18 |
    |    3 | Div     |   25 |
    |    4 | Susan   |   31 |
    |    5 | Tiran   |   40 |
    |    6 | Shasita |   13 |
    +------+---------+------+
    6 rows in set (0.00 sec)

    ========================================================

    More reading,and english is important.

    I'm Hongten

     

    大哥哥大姐姐,觉得有用打赏点哦!你的支持是我最大的动力。谢谢。
    Hongten博客排名在100名以内。粉丝过千。
    Hongten出品,必是精品。

    E | hongtenzone@foxmail.com  B | http://www.cnblogs.com/hongten

    ========================================================

  • 相关阅读:
    对类对象使用new时地址分配的情况
    c++堆与栈的简单认识
    多态公有继承
    Cookie & Session & JSP入门
    Response & ServletContext
    Java网络编程篇文章阅读顺序
    URL编程
    UDP网络编程
    TCP网络编程
    InetAddress类的使用
  • 原文地址:https://www.cnblogs.com/hongten/p/hongten_sqoop.html
Copyright © 2011-2022 走看看