zoukankan      html  css  js  c++  java
  • [原创]安装Sqoop并验证

     

    1.下载包并配置

        下载hadoop-0.20.2-CDH3B4.tar.gzsqoop-1.2.0-CDH3B4.tar.gz 并解压,将Hadoop-0.20.2-CDH3B4 目录下hadoop-core-0.20.2-CDH3B4.jar 复制到Sqoop-1.2.0-CDH3B4lib目录下。修改Sqoop-1.2.0-CDH3B4/bin/ configure-sqoop, 注释掉对ZOOKEEPER_HOME的检查

     

    2. 配置环境变量

    export SQOOP_HOME=/home/admin/sqoop-1.2.0-CDH3B4

    export PATH=$PATH:$SQOOP_HOME/bin

     

    3. 测试安装

     

    [admin@server1 ~]$ sqoop help

    usage: sqoop COMMAND [ARGS]

     

    Available commands:

      codegen            Generate code to interact with database records

      create-hive-table  Import a table definition into Hive

      eval               Evaluate a SQL statement and display the results

      export             Export an HDFS directory to a database table

      help               List available commands

      import             Import a table from a database to HDFS

      import-all-tables  Import tables from a database to HDFS

      job                Work with saved jobs

      list-databases     List available databases on a server

      list-tables        List available tables in a database

      merge              Merge results of incremental imports

      metastore          Run a standalone Sqoop metastore

      version            Display version information

     

    See 'sqoop help COMMAND' for information on a specific command.

     

    4. MySQL导入HDFS

     

    使用原先做Hive实验的数据库实例hive,显示表SEQUENCE_TABLE的数据

    mysql> select * from SEQUENCE_TABLE;

    +-----------------------------------------------------------+----------+

    | SEQUENCE_NAME                                             | NEXT_VAL |

    +-----------------------------------------------------------+----------+

    | org.apache.hadoop.hive.metastore.model.MColumnDescriptor  |       16 |

    | org.apache.hadoop.hive.metastore.model.MDatabase          |        6 |

    | org.apache.hadoop.hive.metastore.model.MSerDeInfo         |       16 |

    | org.apache.hadoop.hive.metastore.model.MStorageDescriptor |       16 |

    | org.apache.hadoop.hive.metastore.model.MTable             |       16 |

    +-----------------------------------------------------------+----------+

    5 rows in set (0.00 sec)

     

    mysql-connector-java-5.1.18-bin.jar 复制到Sqoop-1.2.0-CDH3B4lib目录下,使用sqoop将表SEQUENCE_TABLE的数据导入HDFS

     

    [admin@server1 bin]$ sqoop import --connect jdbc:mysql://server1/hive --username hive --password hive  --table SEQUENCE_TABLE -m 3;

    12/12/16 01:27:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

    12/12/16 01:27:16 INFO tool.CodeGenTool: Beginning code generation

    12/12/16 01:27:16 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1

    12/12/16 01:27:16 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1

    12/12/16 01:27:16 INFO orm.CompilationManager: HADOOP_HOME is /home/admin/hadoop-0.20.2/bin/..

    12/12/16 01:27:16 INFO orm.CompilationManager: Found hadoop core jar at: /home/admin/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar

    12/12/16 01:27:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin/compile/415f2a5412b5c2aadd76474859647419/SEQUENCE_TABLE.jar

    12/12/16 01:27:18 WARN manager.MySQLManager: It looks like you are importing from mysql.

    12/12/16 01:27:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

    12/12/16 01:27:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

    12/12/16 01:27:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

    12/12/16 01:27:18 INFO mapreduce.ImportJobBase: Beginning import of SEQUENCE_TABLE

    12/12/16 01:27:19 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1

    12/12/16 01:27:20 WARN db.TextSplitter: Generating splits for a textual index column.

    12/12/16 01:27:20 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

    12/12/16 01:27:20 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

    12/12/16 01:27:20 INFO mapred.JobClient: Running job: job_201212152320_0004

    12/12/16 01:27:21 INFO mapred.JobClient:  map 0% reduce 0%

    12/12/16 01:27:41 INFO mapred.JobClient:  map 25% reduce 0%

    12/12/16 01:27:51 INFO mapred.JobClient:  map 50% reduce 0%

    12/12/16 01:27:57 INFO mapred.JobClient:  map 100% reduce 0%

    12/12/16 01:27:59 INFO mapred.JobClient: Job complete: job_201212152320_0004

    12/12/16 01:27:59 INFO mapred.JobClient: Counters: 5

    12/12/16 01:27:59 INFO mapred.JobClient:   Job Counters

    12/12/16 01:27:59 INFO mapred.JobClient:     Launched map tasks=4

    12/12/16 01:27:59 INFO mapred.JobClient:   FileSystemCounters

    12/12/16 01:27:59 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=274

    12/12/16 01:27:59 INFO mapred.JobClient:   Map-Reduce Framework

    12/12/16 01:27:59 INFO mapred.JobClient:     Map input records=5

    12/12/16 01:27:59 INFO mapred.JobClient:     Spilled Records=0

    12/12/16 01:27:59 INFO mapred.JobClient:     Map output records=5

    12/12/16 01:27:59 INFO mapreduce.ImportJobBase: Transferred 274 bytes in 40.4297 seconds (6.7772 bytes/sec)

    12/12/16 01:27:59 INFO mapreduce.ImportJobBase: Retrieved 5 records.

     

    查看HDFS中的数据

    [admin@server1 bin]$ hadoop dfs -cat SEQUENCE_TABLE/part*

    org.apache.hadoop.hive.metastore.model.MColumnDescriptor,16

    org.apache.hadoop.hive.metastore.model.MDatabase,6

    org.apache.hadoop.hive.metastore.model.MSerDeInfo,16

    org.apache.hadoop.hive.metastore.model.MStorageDescriptor,16

    org.apache.hadoop.hive.metastore.model.MTable,16

     

    5.HDFS导出到MySQL

     

    HDFS中有文件test.txt, 内容如下:

    [admin@server1 ~]$ hadoop dfs -cat /test.txt

    aaaa,111

    bbbb,222

    cccc,333

    dddd,444

     

    MySQL中建立表test

     

    mysql> CREATE TABLE test(str VARCHAR(10), num INT);

    Query OK, 0 rows affected (0.01 sec)

     

    执行导出

    [admin@server1 ~]$ sqoop export --connect jdbc:mysql://server1/hive --username hive --password hive  --table test --export-dir /test.txt  -m 3;

    12/12/16 01:51:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

    12/12/16 01:51:50 INFO tool.CodeGenTool: Beginning code generation

    12/12/16 01:51:50 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

    12/12/16 01:51:50 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

    12/12/16 01:51:50 INFO orm.CompilationManager: HADOOP_HOME is /home/admin/hadoop-0.20.2/bin/..

    12/12/16 01:51:50 INFO orm.CompilationManager: Found hadoop core jar at: /home/admin/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar

    12/12/16 01:51:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin/compile/bebfa051f0e18c14ad2c466547b23c92/test.jar

    12/12/16 01:51:52 INFO mapreduce.ExportJobBase: Beginning export of test

    12/12/16 01:51:52 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

    12/12/16 01:51:53 INFO input.FileInputFormat: Total input paths to process : 1

    12/12/16 01:51:53 INFO input.FileInputFormat: Total input paths to process : 1

    12/12/16 01:51:53 INFO mapred.JobClient: Running job: job_201212152320_0005

    12/12/16 01:51:54 INFO mapred.JobClient:  map 0% reduce 0%

    12/12/16 01:52:01 INFO mapred.JobClient:  map 100% reduce 0%

    12/12/16 01:52:03 INFO mapred.JobClient: Job complete: job_201212152320_0005

    12/12/16 01:52:03 INFO mapred.JobClient: Counters: 6

    12/12/16 01:52:03 INFO mapred.JobClient:   Job Counters

    12/12/16 01:52:03 INFO mapred.JobClient:     Rack-local map tasks=1

    12/12/16 01:52:03 INFO mapred.JobClient:     Launched map tasks=1

    12/12/16 01:52:03 INFO mapred.JobClient:   FileSystemCounters

    12/12/16 01:52:03 INFO mapred.JobClient:     HDFS_BYTES_READ=42

    12/12/16 01:52:03 INFO mapred.JobClient:   Map-Reduce Framework

    12/12/16 01:52:03 INFO mapred.JobClient:     Map input records=4

    12/12/16 01:52:03 INFO mapred.JobClient:     Spilled Records=0

    12/12/16 01:52:03 INFO mapred.JobClient:     Map output records=4

    12/12/16 01:52:03 INFO mapreduce.ExportJobBase: Transferred 42 bytes in 10.7834 seconds (3.8949 bytes/sec)

    12/12/16 01:52:03 INFO mapreduce.ExportJobBase: Exported 4 records.

     

    查看MySql中的表test记录

     

    mysql> select * from test;

    +------+------+

    | str  | num  |

    +------+------+

    | aaaa |  111 |

    | bbbb |  222 |

    | cccc |  333 |

    | dddd |  444 |

    +------+------+

    4 rows in set (0.00 sec)

     

  • 相关阅读:
    thinkphp tp5 常用 functions
    nginx配置虚拟机 vhost 端口号 域名 区分虚拟机
    thinkphp tp5 模板 引擎 字符串 截取 函数 省略 显示
    C++运算符重载
    c++纯虚函数
    c++面向对象模型---c++如何管理类,对象以及它们之间的联系
    c++多态
    c++友元函数
    c语言的函数指针
    c++两种字符串赋值方式 并介绍 C语言下遍历目录文件的方式
  • 原文地址:https://www.cnblogs.com/leeeee/p/7276653.html
Copyright © 2011-2022 走看看