sqoop使用以及常见问题

zoukankan html css js c++ java

sqoop使用以及常见问题
1、hdfs文件的权限问题

问题分析与解决：

根据报错信息是hdfs文件的权限问题，命令进入集群执行的用户为null，而hdfs文件所有者为hdfs。

要么以用户hdfs执行命令，要么调整hdfs文件的权限，因为我只是使用hdfs文件的其中之一，暂时考虑：以hdfs用户执行命令。

在~/.bash_profile文件增加：

export HADOOP_USER_NAME=hdfs

2、文件格式问题

hive sequencefile导入文件遇到FAILED: SemanticException Unable to load data to destination table. Error: The file that you are trying to load does not match the file format of the destination table.错误

原因

这是因为SequenceFile的表不能使用load来加载数据，只能导入sequence类型的数据

解决办法
- 先创建一个临时表（save as textfile），将数据导入进去，
- 然后再导入这个表里 insert into table test_sq select * from test_tex
3、错误：ERROR tool.ImportTool: Error during import: No primary key could be found for table TRANS_GJJY02. Please specify one with –split-by or perform a sequential import with ‘-m 1’.

根据错误提示我们可以知道这是因为表中的数据没有设置主键。而针对这个问题有两种杰解决方案：

      方案一：老老实实地在表中设置主键，然后再执行这个导入语句，就不会出错。

      方案二：有些数据无法设置主键，比如很多的监测记录数据，找不到唯一值，针对这种数据，我们可以根据上面的错误提示通过以下两个方法来解决：

     （1）将你的map个数设置为1（Sqoop默认是4）

             -m 1

      (2)使用–split-by，后面跟上表的最后一列名字。从而能够对数据进行分行，命令如下：

          –split-by column1

      这两种解决方法，推荐使用方法（2），因为方法（1）只用一个map，效率太低，相比较而言，方法（2）可以自己设置map个数，效率会高一些。

4、Output directory already exists错误

增加配置参数 –delete-target-dir

5、实例：sqoop从MySQL导入数据到Hive

sqoop import
–connect jdbc:mysql://ip:3306/test
–username root
–password 123456
–table users
–fields-terminated-by ‘ ’
–delete-target-dir
–num-mappers 1
–hive-import
–hive-database sqoop
–hive-table users

参数解释：

import：从MySQL导入到HDFS文件系统数据

–connect：数据库JDBC连接字符串

–username：数据库用户名

–password：数据库密码

–table：数据库表名

–columns：数据库列名

–where: 查询条件

–query: 指定查询sql

–delete-target-dir 导入后删除hdfs的目录

–num-mappers 1 指定map数量=1，可以简写为 -m 1

–hive-import       导入hive

–hive-database sqoop hive的database

–hive-table users      hive表

–hive-partition-key    分区字段

–hive-partition-value 分区值

–hive-overwrite     覆盖数据

实际上import命令，从MySQL导入到HDFS文的背后依然是执行的MapReduce。执行完map后，又执行了load data

如果导出的数据库是mysql 则可以添加一个属性  –direct ,加了 direct 属性在导出mysql数据库表中的数据会快一点执行的是mysq自带的导出功能
查看全文

相关阅读:
HDU 4832(DP+计数问题)
mongodb安装与使用
 (hdu step 6.3.7)Cat vs. Dog(当施工方规则：建边当观众和其他观众最喜爱的东西冲突，求最大独立集)
dba_dependencies查询结果视图
 情绪一点点
 c#基于这些，你已经看到了？（一）-----谁才刚刚开始学习使用
 九. 200创业教训万元获得--“神刻”这是忽悠？
初步swift语言学习笔记6(ARC-自己主动引用计数,内存管理)
采用CSS3 Media Query技术适应Android平板屏幕分辨率和屏幕像素密度
 线程的上下文

原文地址：https://www.cnblogs.com/qfdy123/p/12734570.html

最新文章
2014第38周二
 第38周一
 第37周日
 第37周六
 第37周五
 第37周四
 2014第37周四
 第37周二
 第37周一中秋
 hdu3182 状态压缩水题

sqoop使用以及常见问题

1、hdfs文件的权限问题

2、文件格式问题

3、错误：ERROR tool.ImportTool: Error during import: No primary key could be found for table TRANS_GJJY02. Please specify one with –split-by or perform a sequential import with ‘-m 1’.

4、Output directory already exists错误

5、实例：sqoop从MySQL导入数据到Hive