sqoop数据导入命令 (sql---hdfs）

zoukankan html css js c++ java

sqoop数据导入命令 (sql---hdfs）

mysql------->hdfs

sqoop导入数据工作流程：

sqoop提交任务到hadoop------>hadoop启动mapreduce------->mapreduce通过指定参数到表中获取导入的数据------->MapReduce将需导入的数据导入到HDFS中

1.全表导入：

mysql----->hdfs

sqoop import --connect jdbc:mysql://192.168.122.15:3306/company --username hivee --password 123456 --table card -m 1

#-m 1 表示指定一个任务去执行

2.限制数据范围导入：

mysql----->hdfs

sqoop import --connect jdbc:mysql://192.168.122.15:3306/company --username hivee --password 123456 --table card --columns 'id,name' --where 'id>20' -m 1

# --columns <列名> 指定列 --where '条件' 指定条件

3、自由sql语句导入：（使用sql查询语句导入）

mysql----->hdfs

sqoop import --connect jdbc:mysql://192.168.122.15:3306/company --username hivee --password 123456 --table card --target-dir '/input' --query 'select id,name from input where id>20 and $CONDITIONS' -m 1

4.增量导入：持续将新增源数据导入到目标存储位置

mysql----->hdfs

#核心参数： -check-column #指定检索列、 --last-value #（指定导入检索数字）、 -incremental #(指定导入模式)

1>append模式：只对数据进行附加，不支持更改

    sqoop import --connect jdbc:mysql://192.168.122.15:3306/company --username hivee --password 123456 --table card --target-dir '/input' --check-column id --last-value 264  --incremental append -m 1

2>lastmodified ：适用于对源数据进行更改，对于变动数据收集，必须记录变动时间

sqoop import --connect jdbc:mysql://192.168.122.15:3306/company --username hivee --password 123456 --table card --target-dir '/input' --check-column last_mod --last-value '2018-02-02 21:35:01' --incremental lastmodified -m 1 --append

#时间为最大的时间

mysql------>hive

Sqoop常用命令：

全表导入：

1.创建job：

#解释：这个job是将mysql的表导入到hive中，过程为： mysql--àHDFS--àhive

--password-file hdfs://user/mnt/.password.file 这是指定hdfs目录下文件位置

--password-file file:///home/.test 这是指定linux服务器目录下文件位置

[root@hdoop2 hadoop]# sqoop job --create erp4 -- import --connect jdbc:mysql://192.168.18.72:3306/erp_product --username hive --password-file file:///home/.test --table erp_project_obversion_detail --target-dir /test3 --hive-import --hive-table erp_project_obversion_detail --hive-overwrite -m 1

2.查看job列表：

[root@hdoop2 hadoop]# sqoop job --list

3.查看某个job的详细信息：

[root@hdoop2 hadoop]# sqoop job --show erp4

#erp4为job的名称

4.删除job：

[root@hdoop2 hadoop]# sqoop job --delete erp4

5.执行job：

[root@hdoop2 hadoop]# sqoop job --exec erp4

增量导入：持续将新增源数据导入到目标存储位置

Mysql--àhive

1.创建job：

[root@hdoop2 hadoop]# sqoop job --create insert1 -- import --connect jdbc:mysql://192.168.18.72:3306/erp_product --username hive --password-file file:///home/.test     --table erp_project_obversion_detail --target-dir /test4    --hive-import --hive-table erp_project_obversion_detail   --check-column id   --last-value 264 --incremental   append   -m 1

自由查询语句导入：

1.创建job：

[root@hdoop2 hadoop]# sqoop job --create erp1 -- import --connect jdbc:mysql://192.168.18.72:3306/erp_product --username hive --password-file file:///home/.test   --target-dir /test10   --hive-import --hive-table erp_project_obversion_detail --hive-overwrite   --query 'select * from erp_project_obversion_detail where id < 265 and $CONDITIONS ' -m 1

查看全文

相关阅读:
SQLServerframework启动报异常：Module的类型初始值设定项引发异常
 在coding或者github建立个人站点域名绑定
 Github速度慢的解决方法
 jsoup爬取网站图片
 activeMQ类别和流程
 Session session = connection.createSession(paramA,paramB);参数解析
 Ehcache入门经典：第二篇ehcache.xml的参数
 Ehcache入门经典：第一篇
 处理高并发
 扩充次数和创建个数问题

原文地址：https://www.cnblogs.com/byfboke/p/9999480.html