sqoop导入数据到hive

zoukankan html css js c++ java

sqoop导入数据到hive
1.1hive-import参数

使用--hive-import就可以将数据导入到hive中，但是下面这个命令执行后会报错，报错信息如下：

sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table person -m 1 --hive-import
16/07/22 02:22:58 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.223.129:9000/user/root/person already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
报错是因为在用户的家目录下已经存在了一个person目录。

原因是因为sqoop导数据到hive会先将数据导入到HDFS上，然后再将数据load到hive中，最后吧这个目录再删除掉。当这个目录存在的情况下，就会报错。

1.2target-dir参数来指定临时目录

为了解决上面的问题，可以把person目录删除掉，也可以使用target-dir来指定一个临时目录

sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table person -m 1 --hive-import --target-dir temp

执行完成之后，就可以看到在hive中的表了
hive> select * from person; OK 1 zhangsan 2 LISI
1.3hive-overwrite参数

如果上面的语句执行多次，那么会产生这个表数据的多次拷贝

执行三次之后，hive中的数据是
hive> select * from person; OK 1 zhangsan 2 LISI 1 zhangsan 2 LISI 1 zhangsan 2 LISI Time taken: 2.079 seconds, Fetched: 6 row(s)
在hdfs中的表现是：
hive> dfs -ls /user/hive/warehouse/person; Found 3 items -rwxrwxrwt 3 18232184201 supergroup 18 2016-07-22 17:48 /user/hive/warehouse/person/part-m-00000 -rwxrwxrwt 3 18232184201 supergroup 18 2016-07-22 17:51 /user/hive/warehouse/person/part-m-00000_copy_1 -rwxrwxrwt 3 18232184201 supergroup 18 2016-07-22 17:52 /user/hive/warehouse/person/part-m-00000_copy_2
如果想要对这个表的数据进行覆盖，那么就需要用到--hive-overwrite参数

sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table person --hive-import --target-dir temp -m 1 --hive-overwrite

1.4fields-terminated-by

当吧mysql中的数据导入到hdfs中，默认使用的分隔符是逗号

当吧数据导入到hive中，默认使用的是hive表的默认的字段分割符
Storage Desc Params: field.delim u0001 line.delim serialization.format u0001
如果想要改变默认的分隔符，可以使用--fields-terminated-by参数

这个参数在第一次导入hive表的时候决定表的默认分隔符

现在吧hive中的表删除掉，然后重新导入

sqoop import --connect jdbc:mysql://localhost:3306/test --username root--password 123456--table person -m 1 --hive-import --fields-terminated-by "|"

再次查看hive表的分隔符：
Storage Desc Params: field.delim | line.delim serialization.format |
查看全文

相关阅读:
201521123093 java 第二周学习总结
 201521123093 java 第一周总结
 Word 2010怎么自动添加文献引用
 动态链接库（dll）文件的动态调用(使用动态链接库，解析Wis文件--测井数据文件的一种)
open inventor 学习笔记
 井眼轨迹的三次样条插值（vs + QT + coin3d）
VS2010 + QT 5 +open inventor 环境配置
 我的第一个项目（人力资源管理之报表管理）
B-tree 和 B+tree
mysql count(*)与count(1)的区别

原文地址：https://www.cnblogs.com/dongdone/p/5696233.html