zoukankan      html  css  js  c++  java
  • sqoop安装及使用

    简介:

      sqoop是一款用于hadoop和关系型数据库之间数据导入导出的工具。你可以通过sqoop把数据从数据库(比如mysql,oracle)导入到hdfs中;也可以把数据从hdfs中导出到关系型数据库中。通过将sqoop的操作命令转化为Hadoop的MapReduce作业进行导入导出,(通常只涉及到Map任务)即sqoop生成的Job主要是并发运行MapTask实现数据并行传输以提升数据传送速度和效率,如果使用Shell脚本来实现多线程数据传送则存在很大的难度Sqoop2(sqoop1.99.7)需要在Hadoop安装目录下的配置文件中设置代理,属于重量级嵌入安装,文中我们使用qoop1(Sqoop1.4.6)。

    前提:(若不知道如何安装请看我前面写的hadoop分类的文章

    CloudDeskTop上安装了: hadoop-2.7.3  jdk1.7.0_79  mysql-5.5.32 sqoop-1.4.6 hive-1.2.2
    master01和master02安装了: hadoop-2.7.3 jdk1.7.0_79
    slave01、slave02、slave03安装了: hadoop-2.7.3 jdk1.7.0_79 zookeeper-3.4.10

    一、安装:

     1、上传安装包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到/install/目录下

    2、解压:

    [hadoop@CloudDeskTop install]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /software/

    3、配置环境:

    [hadoop@CloudDeskTop software]$ su -lc "vi /etc/profile"

    JAVA_HOME=/software/jdk1.7.0_79
    HADOOP_HOME=/software/hadoop-2.7.3
    SQOOP_HOME=/software/sqoop-1.4.6
    PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/lib:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
    export PATH JAVA_HOME HADOOP_HOME SQOOP_HOME

    4、配置完环境后,执行如下语句,立即生效配置文件:

    [hadoop@CloudDeskTop software]$ source /etc/profile

    5、进入/software/sqoop-1.4.6/lib/目录,上传mysql-connector-java-5.1.43-bin.jar

    这个地方的数据库驱动包必须选择该版本(5.1.43),因为Sqoop需要对接MySql数据库,如果选择的数据库驱动包不是这个版本,很容易出错。

    6、配置sqoop

    [hadoop@CloudDeskTop software]$ cd /software/sqoop-1.4.6/bin/

    [hadoop@CloudDeskTop bin]$ vi configure-sqoop

    注释掉如下代码:用这个符号“:<<COMMENT”作为起始符,“COMMENT”作为结束符;

    127 :<<COMMENT
    128 ## Moved to be a runtime check in sqoop.
    129 if [ ! -d "${HBASE_HOME}" ]; then
    130   echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
    131   echo 'Please set $HBASE_HOME to the root of your HBase installation.'
    132 fi
    133 
    134 ## Moved to be a runtime check in sqoop.
    135 if [ ! -d "${HCAT_HOME}" ]; then
    136   echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
    137   echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
    138 fi
    139 
    140 if [ ! -d "${ACCUMULO_HOME}" ]; then
    141   echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
    142   echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
    143 fi
    144 if [ ! -d "${ZOOKEEPER_HOME}" ]; then
    145   echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
    146   echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
    147 fi
    148 COMMENT
    View Code

    二、启动(没说明的都默认是在hadoop用户下操作)

    【0、在CloudDeskTop的root用户下启动mysql】

    [root@CloudDeskTop ~]# cd /software/mysql-5.5.32/sbin/ && ./mysqld start && lsof -i:3306 && cd -

    【1、在slave节点启动zookeeper集群(小弟中选个leader和follower)】

      cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh start && cd - && jps
      cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh status && cd -

    【2、master01启动HDFS集群】cd /software/ && start-dfs.sh && jps

    【3、master01启动YARN集群】cd /software/ && start-yarn.sh && jps

    【YARN集群启动时,不会把另外一个备用主节点的YARN集群拉起来启动,所以在master02执行语句:】

    cd /software/ && yarn-daemon.sh start resourcemanager && jps

     【4、查看进程】

     

    【6、查询sqoop版本来判断sqoop是否安装成功】

     [hadoop@CloudDeskTop software]$ sqoop version

     

    三、测试

      说明:导入与导出操作的方向是以HDFS集群为基准参考点来定义的,如果数据从HDFS集群流出则表示导出,如果数据流入HDFS集群则表示导入Hive表中的数据实际上是存储到HDFS集群中的,因此对Hive表的导入与导出实际上都是在操作HDFS集群中的文件。

    首先,在本地创建数据:

    在hive数据库建表后上传到集群中表存放数据的路径下:

    [hadoop@CloudDeskTop test]$ hdfs dfs -put testsqoop.out /user/hive/warehouse/mmzs.db/testsqoop

     目标一、将hdfs集群的数据导入到mysql数据库中

    1、在hive数据库mmzs中创建表,并导入数据

    [hadoop@CloudDeskTop software]$ cd /software/hive-1.2.2/bin/
    [hadoop@CloudDeskTop bin]$ ./hive
    hive> show databases;
    OK
    default
    mmzs
    mmzsmysql
    Time taken: 0.373 seconds, Fetched: 3 row(s)
    hive> create table if not exists mmzs.testsqoop(id int,name string,age int) row format delimited fields terminated by '	';
    OK
    Time taken: 0.126 seconds
    hive> select * from mmzs.testsqoop;
    OK
    1    ligang    2
    2    chenghua    3
    3    liqin    1
    4    zhanghua    4
    5    wanghua    1
    6    liulinjin    5
    7    wangxiaochuan    6
    8    guchuan    2
    9    xiaoyong    4
    10    huping    6
    Time taken: 0.824 seconds, Fetched: 10 row(s)

     2、在mysql数据库中创建相同字段的表

    [root@CloudDeskTop bin]# cd ~
    [root@CloudDeskTop ~]# cd /software/mysql-5.5.32/bin/
    [root@CloudDeskTop bin]# ./mysql -uroot -p123456 -P3306 -h192.168.154.134 -e "create database mmzs character set utf8"
    [root@CloudDeskTop bin]# ./mysql -uroot -p123456 -h192.168.154.134 -P3306 -Dmmzs
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 12
    Server version: 5.5.32 Source distribution
    
    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> show tables;
    Empty set (0.00 sec)
    
    mysql> create table if not exists testsqoop(uid int(11),uname varchar(30),age int)engine=innodb charset=utf8
        -> ;
    Query OK, 0 rows affected (0.06 sec)
    
    mysql> desc testsqoop;
    +-------+-------------+------+-----+---------+-------+
    | Field | Type        | Null | Key | Default | Extra |
    +-------+-------------+------+-----+---------+-------+
    | uid   | int(11)     | YES  |     | NULL    |       |
    | uname | varchar(30) | YES  |     | NULL    |       |
    | age   | int(11)     | YES  |     | NULL    |       |
    +-------+-------------+------+-----+---------+-------+
    3 rows in set (0.00 sec)
    
    mysql> select * from testsqoop;
    Empty set (0.01 sec)

    3、使用Sqoop将Hive表中的数据导出到MySql数据库中(整个HDFS文件导出)

    [hadoop@CloudDeskTop software]$ sqoop-export --help

    17/12/30 21:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]
    
    Common arguments:
       --connect <jdbc-uri>                         Specify JDBC connect
                                                    string
       --connection-manager <class-name>            Specify connection manager
                                                    class name
       --connection-param-file <properties-file>    Specify connection
                                                    parameters file
       --driver <class-name>                        Manually specify JDBC
                                                    driver class to use
       --hadoop-home <hdir>                         Override
                                                    $HADOOP_MAPRED_HOME_ARG
       --hadoop-mapred-home <dir>                   Override
                                                    $HADOOP_MAPRED_HOME_ARG
       --help                                       Print usage instructions
    -P                                              Read password from console
       --password <password>                        Set authentication
                                                    password
       --password-alias <password-alias>            Credential provider
                                                    password alias
       --password-file <password-file>              Set authentication
                                                    password file path
       --relaxed-isolation                          Use read-uncommitted
                                                    isolation for imports
       --skip-dist-cache                            Skip copying jars to
                                                    distributed cache
       --username <username>                        Set authentication
                                                    username
       --verbose                                    Print more information
                                                    while working
    
    Export control arguments:
       --batch                                                    Indicates
                                                                  underlying
                                                                  statements
                                                                  to be
                                                                  executed in
                                                                  batch mode
       --call <arg>                                               Populate the
                                                                  table using
                                                                  this stored
                                                                  procedure
                                                                  (one call
                                                                  per row)
       --clear-staging-table                                      Indicates
                                                                  that any
                                                                  data in
                                                                  staging
                                                                  table can be
                                                                  deleted
       --columns <col,col,col...>                                 Columns to
                                                                  export to
                                                                  table
       --direct                                                   Use direct
                                                                  export fast
                                                                  path
       --export-dir <dir>                                         HDFS source
                                                                  path for the
                                                                  export
    -m,--num-mappers <n>                                          Use 'n' map
                                                                  tasks to
                                                                  export in
                                                                  parallel
       --mapreduce-job-name <name>                                Set name for
                                                                  generated
                                                                  mapreduce
                                                                  job
       --staging-table <table-name>                               Intermediate
                                                                  staging
                                                                  table
       --table <table-name>                                       Table to
                                                                  populate
       --update-key <key>                                         Update
                                                                  records by
                                                                  specified
                                                                  key column
       --update-mode <mode>                                       Specifies
                                                                  how updates
                                                                  are
                                                                  performed
                                                                  when new
                                                                  rows are
                                                                  found with
                                                                  non-matching
                                                                  keys in
                                                                  database
       --validate                                                 Validate the
                                                                  copy using
                                                                  the
                                                                  configured
                                                                  validator
       --validation-failurehandler <validation-failurehandler>    Fully
                                                                  qualified
                                                                  class name
                                                                  for
                                                                  ValidationFa
                                                                  ilureHandler
       --validation-threshold <validation-threshold>              Fully
                                                                  qualified
                                                                  class name
                                                                  for
                                                                  ValidationTh
                                                                  reshold
       --validator <validator>                                    Fully
                                                                  qualified
                                                                  class name
                                                                  for the
                                                                  Validator
    
    Input parsing arguments:
       --input-enclosed-by <char>               Sets a required field encloser
       --input-escaped-by <char>                Sets the input escape
                                                character
       --input-fields-terminated-by <char>      Sets the input field separator
       --input-lines-terminated-by <char>       Sets the input end-of-line
                                                char
       --input-optionally-enclosed-by <char>    Sets a field enclosing
                                                character
    
    Output line formatting arguments:
       --enclosed-by <char>               Sets a required field enclosing
                                          character
       --escaped-by <char>                Sets the escape character
       --fields-terminated-by <char>      Sets the field separator character
       --lines-terminated-by <char>       Sets the end-of-line character
       --mysql-delimiters                 Uses MySQL's default delimiter set:
                                          fields: ,  lines: 
      escaped-by: 
                                          optionally-enclosed-by: '
       --optionally-enclosed-by <char>    Sets a field enclosing character
    
    Code generation arguments:
       --bindir <dir>                        Output directory for compiled
                                             objects
       --class-name <name>                   Sets the generated class name.
                                             This overrides --package-name.
                                             When combined with --jar-file,
                                             sets the input class.
       --input-null-non-string <null-str>    Input null non-string
                                             representation
       --input-null-string <null-str>        Input null string representation
       --jar-file <file>                     Disable code generation; use
                                             specified jar
       --map-column-java <arg>               Override mapping for specific
                                             columns to java types
       --null-non-string <null-str>          Null non-string representation
       --null-string <null-str>              Null string representation
       --outdir <dir>                        Output directory for generated
                                             code
       --package-name <name>                 Put auto-generated classes in
                                             this package
    
    HCatalog arguments:
       --hcatalog-database <arg>                        HCatalog database name
       --hcatalog-home <hdir>                           Override $HCAT_HOME
       --hcatalog-partition-keys <partition-key>        Sets the partition
                                                        keys to use when
                                                        importing to hive
       --hcatalog-partition-values <partition-value>    Sets the partition
                                                        values to use when
                                                        importing to hive
       --hcatalog-table <arg>                           HCatalog table name
       --hive-home <dir>                                Override $HIVE_HOME
       --hive-partition-key <partition-key>             Sets the partition key
                                                        to use when importing
                                                        to hive
       --hive-partition-value <partition-value>         Sets the partition
                                                        value to use when
                                                        importing to hive
       --map-column-hive <arg>                          Override mapping for
                                                        specific column to
                                                        hive types.
    
    Generic Hadoop command-line arguments:
    (must preceed any tool-specific arguments)
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|resourcemanager:port>    specify a ResourceManager
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    
    
    At minimum, you must specify --connect, --export-dir, and --table
    View Code

    #-m是指定map任务的个数

    [hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '	' --lines-terminated-by '
    ' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2
    [hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '	' --lines-terminated-by '
    ' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2
    17/12/30 22:02:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    17/12/30 22:02:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    17/12/30 22:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    17/12/30 22:02:04 INFO tool.CodeGenTool: Beginning code generation
    17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
    17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
    17/12/30 22:02:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
    注: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.java使用或覆盖了已过时的 API。
    注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
    17/12/30 22:02:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.jar
    17/12/30 22:02:11 INFO mapreduce.ExportJobBase: Beginning export of testsqoop
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    17/12/30 22:02:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    17/12/30 22:02:13 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
    17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
    17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1
    17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1
    17/12/30 22:02:23 INFO mapreduce.JobSubmitter: number of splits:2
    17/12/30 22:02:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
    17/12/30 22:02:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0001
    17/12/30 22:02:25 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0001
    17/12/30 22:02:25 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0001/
    17/12/30 22:02:25 INFO mapreduce.Job: Running job: job_1514638990227_0001
    17/12/30 22:03:13 INFO mapreduce.Job: Job job_1514638990227_0001 running in uber mode : false
    17/12/30 22:03:13 INFO mapreduce.Job:  map 0% reduce 0%
    17/12/30 22:03:58 INFO mapreduce.Job:  map 100% reduce 0%
    17/12/30 22:03:59 INFO mapreduce.Job: Job job_1514638990227_0001 completed successfully
    17/12/30 22:03:59 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=277282
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=484
            HDFS: Number of bytes written=0
            HDFS: Number of read operations=8
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=0
        Job Counters 
            Launched map tasks=2
            Data-local map tasks=2
            Total time spent by all maps in occupied slots (ms)=79918
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=79918
            Total vcore-milliseconds taken by all map tasks=79918
            Total megabyte-milliseconds taken by all map tasks=81836032
        Map-Reduce Framework
            Map input records=10
            Map output records=10
            Input split bytes=286
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=386
            CPU time spent (ms)=4950
            Physical memory (bytes) snapshot=216600576
            Virtual memory (bytes) snapshot=1697566720
            Total committed heap usage (bytes)=32874496
        File Input Format Counters 
            Bytes Read=0
        File Output Format Counters 
            Bytes Written=0
    17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Transferred 484 bytes in 105.965 seconds (4.5675 bytes/sec)
    17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Exported 10 records.
    运行截图

    小结:从运行过程可以看出只有Map任务,没有Reduce任务。

    4、在mysql数据库再次查询结果

    mysql> select * from testsqoop;
    +------+---------------+------+
    | uid  | uname         | age  |
    +------+---------------+------+
    |    1 | ligang        |    2 |
    |    2 | chenghua      |    3 |
    |    3 | liqin         |    1 |
    |    4 | zhanghua      |    4 |
    |    5 | wanghua       |    1 |
    |    6 | liulinjin     |    5 |
    |    7 | wangxiaochuan |    6 |
    |    8 | guchuan       |    2 |
    |    9 | xiaoyong      |    4 |
    |   10 | huping        |    6 |
    +------+---------------+------+
    10 rows in set (0.00 sec)

    从结果可以证明数据导出到mysql数据库成功。

     目标二、将mysql的数据导入到hdfs集群中

     1、删除hive中mmzs数据库的testsqoop表的数据

     

     确认真的删除了:

    2、将mysql中的数据导入到hdfs群

    A、指定部分查询数据导入到集群众

    [hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '	' --lines-terminated-by '
    '
    [hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '	' --lines-terminated-by '
    '
    17/12/30 22:40:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    17/12/30 22:40:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    17/12/30 22:40:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    17/12/30 22:40:55 INFO tool.CodeGenTool: Beginning code generation
    17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0) 
    17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0) 
    17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0) 
    17/12/30 22:40:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
    注: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.java使用或覆盖了已过时的 API。
    注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
    17/12/30 22:40:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.jar
    17/12/30 22:40:58 INFO mapreduce.ImportJobBase: Beginning query import.
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    17/12/30 22:40:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    17/12/30 22:41:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/12/30 22:41:08 INFO db.DBInputFormat: Using read commited transaction isolation
    17/12/30 22:41:09 INFO mapreduce.JobSubmitter: number of splits:1
    17/12/30 22:41:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0003
    17/12/30 22:41:10 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0003
    17/12/30 22:41:10 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0003/
    17/12/30 22:41:10 INFO mapreduce.Job: Running job: job_1514638990227_0003
    17/12/30 22:41:54 INFO mapreduce.Job: Job job_1514638990227_0003 running in uber mode : false
    17/12/30 22:41:54 INFO mapreduce.Job:  map 0% reduce 0%
    17/12/30 22:42:29 INFO mapreduce.Job:  map 100% reduce 0%
    17/12/30 22:42:31 INFO mapreduce.Job: Job job_1514638990227_0003 completed successfully
    17/12/30 22:42:32 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=138692
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=87
            HDFS: Number of bytes written=94
            HDFS: Number of read operations=4
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=32275
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=32275
            Total vcore-milliseconds taken by all map tasks=32275
            Total megabyte-milliseconds taken by all map tasks=33049600
        Map-Reduce Framework
            Map input records=7
            Map output records=7
            Input split bytes=87
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=170
            CPU time spent (ms)=2020
            Physical memory (bytes) snapshot=109428736
            Virtual memory (bytes) snapshot=851021824
            Total committed heap usage (bytes)=19091456
        File Input Format Counters 
            Bytes Read=0
        File Output Format Counters 
            Bytes Written=94
    17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 91.0632 seconds (1.0322 bytes/sec)
    17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Retrieved 7 records.
    17/12/30 22:42:32 INFO util.AppendUtils: Appending to directory testsqoop
    View Code

    在集群中查询是否真的导入了数据:

     

    在hive数据库中中查询是否真的导入了数据:

    从结果可以证明数据导入到hdfs集群成功。

    删除集群数据,方便下次导入操作:

    [hadoop@master01 software]$ hdfs dfs -rm -r /user/hive/warehouse/mmzs.db/testsqoop/part-m-00000

    B、指定一张表,整个表的数据一起导入到集群中

    sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '	' --lines-terminated-by '
    '
    [hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '	' --lines-terminated-by '
    '
    17/12/30 22:28:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    17/12/30 22:28:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    17/12/30 22:28:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    17/12/30 22:28:32 INFO tool.CodeGenTool: Beginning code generation
    17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
    17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
    17/12/30 22:28:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
    注: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.java使用或覆盖了已过时的 API。
    注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
    17/12/30 22:28:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.jar
    17/12/30 22:28:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
    17/12/30 22:28:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
    17/12/30 22:28:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
    17/12/30 22:28:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
    17/12/30 22:28:36 INFO mapreduce.ImportJobBase: Beginning import of testsqoop
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    17/12/30 22:28:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    17/12/30 22:28:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/12/30 22:28:45 INFO db.DBInputFormat: Using read commited transaction isolation
    17/12/30 22:28:45 INFO mapreduce.JobSubmitter: number of splits:1
    17/12/30 22:28:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0002
    17/12/30 22:28:46 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0002
    17/12/30 22:28:47 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0002/
    17/12/30 22:28:47 INFO mapreduce.Job: Running job: job_1514638990227_0002
    17/12/30 22:29:29 INFO mapreduce.Job: Job job_1514638990227_0002 running in uber mode : false
    17/12/30 22:29:29 INFO mapreduce.Job:  map 0% reduce 0%
    17/12/30 22:30:06 INFO mapreduce.Job:  map 100% reduce 0%
    17/12/30 22:30:07 INFO mapreduce.Job: Job job_1514638990227_0002 completed successfully
    17/12/30 22:30:08 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=138842
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=87
            HDFS: Number of bytes written=128
            HDFS: Number of read operations=4
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=33630
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=33630
            Total vcore-milliseconds taken by all map tasks=33630
            Total megabyte-milliseconds taken by all map tasks=34437120
        Map-Reduce Framework
            Map input records=10
            Map output records=10
            Input split bytes=87
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=177
            CPU time spent (ms)=2490
            Physical memory (bytes) snapshot=109060096
            Virtual memory (bytes) snapshot=850882560
            Total committed heap usage (bytes)=18972672
        File Input Format Counters 
            Bytes Read=0
        File Output Format Counters 
            Bytes Written=128
    17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Transferred 128 bytes in 89.4828 seconds (1.4304 bytes/sec)
    17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Retrieved 10 records.
    17/12/30 22:30:08 INFO util.AppendUtils: Appending to directory testsqoop
    运行结果

    在集群中查询是否真的导入了数据:

    在hive数据库中中查询是否真的导入了数据:

     

     从结果可以证明数据导入到hdfs集群成功。

  • 相关阅读:
    【2020-11-01】从身边人开始输出自己的价值
    【一句日历】2020年11月
    【2020-10-31】继续解锁自己内心的矛盾
    【2020-10-29】静下心来,书中自有黄金
    【2020-10-28】平凡人终归还是要回归到小日子上
    【2020-10-27】抗衡自己的摇摆幅度
    【2020-10-26】市场驱动学习和进步
    【2020-10-25】窜着野炊的心干着农民的活
    暑假集训2016day3T1 欧拉回路(UOJ #117欧拉回路)(史上最全的欧拉回路纯无向图/有向图解析)
    leetcode1282
  • 原文地址:https://www.cnblogs.com/mmzs/p/8149921.html
Copyright © 2011-2022 走看看