用Sqoop实现数据HDFS到mysql到Hive

zoukankan html css js c++ java

用Sqoop实现数据HDFS到mysql到Hive
【Hive】用Sqoop实现数据HDFS到mysql到Hive

大数据协作框架

“大数据协作框架”其实是一个统称，主要是以下四个框架
数据转换工具 Sqoop

文件收集库框架 Flume

任务调度框架 Oozie

大数据WEB工具Hue
Sqoop作用

将关系数据库中的某张表数据抽取到Hadoop的HDFS文件系统中，底层运行的还是MapReduce。

利用MapReduce加快数据传输速度。

批处理方式进行数据传输。

也可以将HDFS上的文件数据或者Hive表中的数据导出到关系型数据库当中的某张表中。

HDFS →RDBMS
sqoop export

--connect jdbc:mysql://xxx:3306/xxx

--username xxx

--password xxx

--table xxx

--export-dir xxx
RDBMS→Hive
sqoop import

--connect jdbc:mysql://xxx:3306/xxx

--username xxx

--password xxx

--fields-terminated-by " "

--table xxx

--hive-import

--hive-table xxx
Hive→RDBMS
sqoop export

--connect jdbc:mysql://xxx:3306/xxx

--username xxx

--password xxx

--table xxx

--export-dir xxx

--input-fields-terminated-by ' '
RDBMS→HDFS
sqoop import

--connect jdbc:mysql://xxx:3306/xxx

--username xxx

--password xxx

--table xxx

--target-dir xxx
规律：
从RDBMS导入到HDFS或者Hive中的都使用import；从HDFS或者Hive导出到RDBMS中的都使用export；以HDFS和Hive为参考，根据数据流向选择关键字。

connect、username、password、table四个参数为每一种传输都必须的；其中connect参数格式均为--connect jdbc:mysql://主机名:3306/数据库名(使用mysql数据库)；table是指明mysql中的表名。

export-dir参数只有在导出数据到RDBMS中时才会用到，含义为表在hdfs中存放的路径。
区别：
HDFS →RDBMS：指明table：mysql中的表，需要自行先创建；指明export-dir：HDFS中数据的存储路径

RDBMS→Hive：指明fields-terminated-by：指定分隔符，分隔符指的是存放在Hive中的数据的分隔符，如果目标将存储在Hive中可理解为编码格式，若目标将存储在RDBMS上，则可理解为解码格式；指明table:mysql中的表名；指明hive-import：导入到hive操作;指明hive-table:hive中的表名。注意：table参数不可以与用户家目录下已存在的目录重名，因为sqoop导数据到hive会先将数据导入到HDFS上，然后再将数据load到hive中，最后把这个目录再删除掉。

Hive→RDBMS：指明table:mysql中的表名；指明export-dir:hive在hdfs中存储的路径;指明hive-table:hive中的表名。

RDBMS→HDFS：指明table:mysql里的表名;指明target-dir:hdfs存储数据的目录;
Sqoop安装

配置Sqoop1.x

conf目录【sqoop-env-template.sh】
export HADOOP_COMMON_HOME=Hadoop目录

export HADOOP_MAPRED_HOME=Hadoop目录

export HIVE_HOME=Hive目录

export ZOOCFGDIR=Zookeeper目录
将mysqlJDBC驱动包拷到sqoop的lib目录下

测试sqoop
bin/sqoop list-databases

--connect jdbc:mysql://主机名：3306

--username root

--password 123456
查看本地mysql
mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| metastore |

| mysql |

| test |

+--------------------+

4 rows in set (0.00 sec)

mysql> use test;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;

+----------------+

| Tables_in_test |

+----------------+

| my_user |

+----------------+

1 row in set (0.00 sec)

mysql> desc my_user;

+---------+--------------+------+-----+---------+----------------+

| Field | Type | Null | Key | Default | Extra |

+---------+--------------+------+-----+---------+----------------+

| id | tinyint(4) | NO | PRI | NULL | auto_increment |

| account | varchar(255) | YES | | NULL | |

| passwd | varchar(255) | YES | | NULL | |

+---------+--------------+------+-----+---------+----------------+

3 rows in set (0.00 sec)

mysql> select * from my_user;

+----+----------+----------+

| id | account | passwd |

+----+----------+----------+

| 1 | admin | admin |

| 2 | johnny | 123456 |

| 3 | zhangsan | zhangsan |

| 4 | lisi | lisi |

| 5 | test | test |

| 6 | qiqi | qiqi |

| 7 | hangzhou | hangzhou |

+----+----------+----------+

7 rows in set (0.00 sec)
hive创建相同结构的空表
hive (test)> create table h_user(

> id int,

> account string,

> passwd string

> )row format delimited fields terminated by ' ';

OK

Time taken: 0.113 seconds

hive (test)> desc h_user;

OK

col_name data_type comment

id int

account string

passwd string

Time taken: 0.228 seconds, Fetched: 3 row(s)
从本地mysql导出数据到Hive里
bin/sqoop import

--connect jdbc:mysql://cdaisuke:3306/test

--username root

--password 123456

--table my_user

--num-mappers 1

--delete-target-dir

--fields-terminated-by " "

--hive-database test

--hive-import

--hive-table h_user

hive (test)> select * from h_user;

OK

h_user.id h_user.account h_user.passwd

1 admin admin

2 johnny 123456

3 zhangsan zhangsan

4 lisi lisi

5 test test

6 qiqi qiqi

7 hangzhou hangzhou

Time taken: 0.061 seconds, Fetched: 7 row(s)
从mysql导入到HDFS里
bin/sqoop import

--connect jdbc:mysql://cdaisuke:3306/test

--username root

--password 123456

--table my_user

--num-mappers 3

--target-dir /user/hadoop/

--delete-target-dir

--fields-terminated-by " "

------------------------------------------------------------

[hadoop@cdaisuke sqoop-1.4.5-cdh5.3.6]$ bin/sqoop import

> --connect jdbc:mysql://cdaisuke:3306/test

> --username root

> --password 123456

> --table my_user

> --num-mappers 3

> --target-dir /user/hadoop/

> --delete-target-dir

> --fields-terminated-by " "

18/08/14 00:02:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.6

18/08/14 00:02:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

18/08/14 00:02:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

18/08/14 00:02:12 INFO tool.CodeGenTool: Beginning code generation

18/08/14 00:02:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_user` AS t LIMIT 1

18/08/14 00:02:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_user` AS t LIMIT 1

18/08/14 00:02:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/modules/hadoop-2.5.0-cdh5.3.6_Hive

Note: /tmp/sqoop-hadoop/compile/7c8bdb7cd3df7b2f4b48700704f46f65/my_user.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

18/08/14 00:02:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7c8bdb7cd3df7b2f4b48700704f46f65/my_user.jar

18/08/14 00:02:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/08/14 00:02:22 INFO tool.ImportTool: Destination directory /user/hadoop is not present, hence not deleting.

18/08/14 00:02:22 WARN manager.MySQLManager: It looks like you are importing from mysql.

18/08/14 00:02:22 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

18/08/14 00:02:22 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

18/08/14 00:02:22 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

18/08/14 00:02:22 INFO mapreduce.ImportJobBase: Beginning import of my_user

18/08/14 00:02:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

18/08/14 00:02:22 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

18/08/14 00:02:23 INFO client.RMProxy: Connecting to ResourceManager at slave01/192.168.79.140:8032

18/08/14 00:02:28 INFO db.DBInputFormat: Using read commited transaction isolation

18/08/14 00:02:28 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `my_user`

18/08/14 00:02:28 INFO mapreduce.JobSubmitter: number of splits:3

18/08/14 00:02:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533652222364_0078

18/08/14 00:02:29 INFO impl.YarnClientImpl: Submitted application application_1533652222364_0078

18/08/14 00:02:29 INFO mapreduce.Job: The url to track the job: http://slave01:8088/proxy/application_1533652222364_0078/

18/08/14 00:02:29 INFO mapreduce.Job: Running job: job_1533652222364_0078

18/08/14 00:02:50 INFO mapreduce.Job: Job job_1533652222364_0078 running in uber mode : false

18/08/14 00:02:50 INFO mapreduce.Job: map 0% reduce 0%

18/08/14 00:03:00 INFO mapreduce.Job: map 33% reduce 0%

18/08/14 00:03:01 INFO mapreduce.Job: map 67% reduce 0%

18/08/14 00:03:02 INFO mapreduce.Job: map 100% reduce 0%

18/08/14 00:03:02 INFO mapreduce.Job: Job job_1533652222364_0078 completed successfully

18/08/14 00:03:02 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=394707

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=295

HDFS: Number of bytes written=106

HDFS: Number of read operations=12

HDFS: Number of large read operations=0

HDFS: Number of write operations=6

Job Counters

Launched map tasks=3

Other local map tasks=3

Total time spent by all maps in occupied slots (ms)=25213

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=25213

Total vcore-seconds taken by all map tasks=25213

Total megabyte-seconds taken by all map tasks=25818112

Map-Reduce Framework

Map input records=7

Map output records=7

Input split bytes=295

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=352

CPU time spent (ms)=3600

Physical memory (bytes) snapshot=316162048

Virtual memory (bytes) snapshot=2523156480

Total committed heap usage (bytes)=77766656

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=106

18/08/14 00:03:02 INFO mapreduce.ImportJobBase: Transferred 106 bytes in 40.004 seconds (2.6497 bytes/sec)

18/08/14 00:03:02 INFO mapreduce.ImportJobBase: Retrieved 7 records.
设置3个map任务

--num-mappers 3

设置HDFS目标存储目录

--target-dir /user/hadoop/

如果设置目录存在则删除此目录

--delete-target-dir

从Hive导出到mysql

在mysql创建新表
create table user_export(

id tinyint(4) not null auto_increment,

account varchar(255) default null,

passwd varchar(255) default null,

primary key(id)

);
用sqoop导出数据
bin/sqoop export

--connect jdbc:mysql://cdaisuke:3306/test

--username root

--password 123456

--table user_export

--num-mappers 1

--fields-terminated-by " "

--export-dir /user/hive/warehouse/test.db/h_user

----------------------------------------------------

[hadoop@cdaisuke sqoop-1.4.5-cdh5.3.6]$ bin/sqoop export

> --connect jdbc:mysql://cdaisuke:3306/test

> --username root

> --password 123456

> --table user_export

> --num-mappers 1

> --fields-terminated-by " "

> --export-dir /user/hive/warehouse/test.db/h_user

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

18/08/14 00:16:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.6

18/08/14 00:16:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

18/08/14 00:16:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

18/08/14 00:16:33 INFO tool.CodeGenTool: Beginning code generation

18/08/14 00:16:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_export` AS t LIMIT 1

18/08/14 00:16:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_export` AS t LIMIT 1

18/08/14 00:16:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/modules/hadoop-2.5.0-cdh5.3.6_Hive

Note: /tmp/sqoop-hadoop/compile/6823ffae505b34f7ae8b9881bae4b898/user_export.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

18/08/14 00:16:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/6823ffae505b34f7ae8b9881bae4b898/user_export.jar

18/08/14 00:16:39 INFO mapreduce.ExportJobBase: Beginning export of user_export

18/08/14 00:16:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/08/14 00:16:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

18/08/14 00:16:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

18/08/14 00:16:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

18/08/14 00:16:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

18/08/14 00:16:43 INFO client.RMProxy: Connecting to ResourceManager at slave01/192.168.79.140:8032

18/08/14 00:16:48 INFO input.FileInputFormat: Total input paths to process : 1

18/08/14 00:16:48 INFO input.FileInputFormat: Total input paths to process : 1

18/08/14 00:16:48 INFO mapreduce.JobSubmitter: number of splits:1

18/08/14 00:16:48 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

18/08/14 00:16:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533652222364_0079

18/08/14 00:16:50 INFO impl.YarnClientImpl: Submitted application application_1533652222364_0079

18/08/14 00:16:50 INFO mapreduce.Job: The url to track the job: http://slave01:8088/proxy/application_1533652222364_0079/

18/08/14 00:16:50 INFO mapreduce.Job: Running job: job_1533652222364_0079

18/08/14 00:17:11 INFO mapreduce.Job: Job job_1533652222364_0079 running in uber mode : false

18/08/14 00:17:11 INFO mapreduce.Job: map 0% reduce 0%

18/08/14 00:17:27 INFO mapreduce.Job: map 100% reduce 0%

18/08/14 00:17:27 INFO mapreduce.Job: Job job_1533652222364_0079 completed successfully

18/08/14 00:17:27 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=131287

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=258

HDFS: Number of bytes written=0

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=13426

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=13426

Total vcore-seconds taken by all map tasks=13426

Total megabyte-seconds taken by all map tasks=13748224

Map-Reduce Framework

Map input records=7

Map output records=7

Input split bytes=149

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=73

CPU time spent (ms)=1230

Physical memory (bytes) snapshot=113061888

Virtual memory (bytes) snapshot=838946816

Total committed heap usage (bytes)=45613056

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

18/08/14 00:17:27 INFO mapreduce.ExportJobBase: Transferred 258 bytes in 44.2695 seconds (5.8279 bytes/sec)

18/08/14 00:17:27 INFO mapreduce.ExportJobBase: Exported 7 records.

-----------------------------------------------------------------

mysql> select * from user_export;

+----+----------+----------+

| id | account | passwd |

+----+----------+----------+

| 1 | admin | admin |

| 2 | johnny | 123456 |

| 3 | zhangsan | zhangsan |

| 4 | lisi | lisi |

| 5 | test | test |

| 6 | qiqi | qiqi |

| 7 | hangzhou | hangzhou |

+----+----------+----------+

7 rows in set (0.00 sec)
从HDFS导出到mysql

在mysql创建新表
create table my_user2(

id tinyint(4) not null auto_increment,

account varchar(255) default null,

passwd varchar(255) default null,

primary key (id)

);

---------------------------------------------------------

[hadoop@cdaisuke sqoop-1.4.5-cdh5.3.6]$ bin/sqoop export

> --connect jdbc:mysql://cdaisuke:3306/test

> --username root

> --password 123456

> --table my_user2

> --num-mappers 1

> --fields-terminated-by " "

> --export-dir /user/hadoop

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

18/08/14 00:39:51 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.6

18/08/14 00:39:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

18/08/14 00:39:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

18/08/14 00:39:52 INFO tool.CodeGenTool: Beginning code generation

18/08/14 00:39:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_user2` AS t LIMIT 1

18/08/14 00:39:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_user2` AS t LIMIT 1

18/08/14 00:39:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/modules/hadoop-2.5.0-cdh5.3.6_Hive

Note: /tmp/sqoop-hadoop/compile/7222f42cd6507a21fdcef7600bd14a20/my_user2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

18/08/14 00:39:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7222f42cd6507a21fdcef7600bd14a20/my_user2.jar

18/08/14 00:39:59 INFO mapreduce.ExportJobBase: Beginning export of my_user2

18/08/14 00:40:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/08/14 00:40:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

18/08/14 00:40:04 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

18/08/14 00:40:04 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

18/08/14 00:40:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

18/08/14 00:40:04 INFO client.RMProxy: Connecting to ResourceManager at slave01/192.168.79.140:8032

18/08/14 00:40:09 INFO input.FileInputFormat: Total input paths to process : 3

18/08/14 00:40:09 INFO input.FileInputFormat: Total input paths to process : 3

18/08/14 00:40:09 INFO mapreduce.JobSubmitter: number of splits:1

18/08/14 00:40:09 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

18/08/14 00:40:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533652222364_0084

18/08/14 00:40:11 INFO impl.YarnClientImpl: Submitted application application_1533652222364_0084

18/08/14 00:40:11 INFO mapreduce.Job: The url to track the job: http://slave01:8088/proxy/application_1533652222364_0084/

18/08/14 00:40:11 INFO mapreduce.Job: Running job: job_1533652222364_0084

18/08/14 00:40:30 INFO mapreduce.Job: Job job_1533652222364_0084 running in uber mode : false

18/08/14 00:40:30 INFO mapreduce.Job: map 0% reduce 0%

18/08/14 00:40:46 INFO mapreduce.Job: map 100% reduce 0%

18/08/14 00:40:46 INFO mapreduce.Job: Job job_1533652222364_0084 completed successfully

18/08/14 00:40:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=131229

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=365

HDFS: Number of bytes written=0

HDFS: Number of read operations=10

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=13670

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=13670

Total vcore-seconds taken by all map tasks=13670

Total megabyte-seconds taken by all map tasks=13998080

Map-Reduce Framework

Map input records=7

Map output records=7

Input split bytes=250

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=89

CPU time spent (ms)=1670

Physical memory (bytes) snapshot=115961856

Virtual memory (bytes) snapshot=838946816

Total committed heap usage (bytes)=45613056

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

18/08/14 00:40:46 INFO mapreduce.ExportJobBase: Transferred 365 bytes in 42.3534 seconds (8.618 bytes/sec)

18/08/14 00:40:46 INFO mapreduce.ExportJobBase: Exported 7 records.

------------------------------------------------------------------------

mysql> select * from my_user2;

+----+----------+----------+

| id | account | passwd |

+----+----------+----------+

| 1 | admin | admin |

| 2 | johnny | 123456 |

| 3 | zhangsan | zhangsan |

| 4 | lisi | lisi |

| 5 | test | test |

| 6 | qiqi | qiqi |

| 7 | hangzhou | hangzhou |

+----+----------+----------+

7 rows in set (0.00 sec)
查看全文

相关阅读:
随笔程序能干啥？
Net C# 扩展方法
 iOS中控制器的实践和学习(4)简易5图之A4
简单的程序员
 阅读iPhone.3D.ProgrammingHelloArrow项目
 有感阅读iPhone.3D.ProgrammingHelloArrow项目
 WebResource.axd
[转]ASP.NET AJAX clientside framework failed to load
BinConvertor
[转]ASP.NET AJAX and Sys.Webforms.PageRequestManagerServerErrorException

原文地址：https://www.cnblogs.com/buffercache/p/14238817.html