zoukankan html css js c++ java

Sqoop葵花宝典

基于Sqoop1.x

场景

导入流程

graph LR A[RDBMS] -->|Sqoop| B(Hive)

导出流程

graph LR A[Hive] -->|Sqoop| B(RDBMS)

字段说明

字段	MySQL类型	Hive类型
id	int	int
name	varchar(100)	string
desc	varchar(255)	string

导入

普通表

三种表建表语句类似，只是文件格式变化。

CREATE TABLE user_parquet(
   id   int,
   name string,
   desc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '01'
STORED AS parquet;

txt格式

sqoop import 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --fields-terminated-by '01' 
 --hive-import 
 --delete-target-dir 
 --m 1 
 --hive-database test 
 --hive-table user_text

注意： txt格式可以不需要使用hive-database，直接使用hive-table即可（database.tablename的形式）

parquet格式

sqoop import 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --fields-terminated-by '01' 
 --hive-import 
 --delete-target-dir 
 --m 1 
 --hive-database test 
 --hive-table user_parquet 
 --as-parquetfile

注意： 如果是parquet格式，sqoop脚本需要使用hive-database、as-parquetfile参数。

orc格式

sqoop import 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --fields-terminated-by '01' 
 --delete-target-dir 
 --m 1 
 --hcatalog-database test 
 --hcatalog-table user_orc

注意： 需要使用hcatalog-database、hcatalog-table参数来进行导入。

分区表

CREATE TABLE user_parquet_p(
   id   int,
   name string,
   desc string
)
PARTITIONED BY (part_dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '01'
STORED AS parquet;

txt格式

sqoop import 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --fields-terminated-by '01' 
 --hive-import 
 --delete-target-dir 
 --m 1 
 --hive-database test 
 --hive-table user_text_p 
 --hive-partition-key part_dt 
 --hive-partition-value '20190314'

注意： 分区表需要增加hive-partition-key、hive-partition-value来指定导入的分区，但是不支持多分区。

也可以通过orc这样方式使用hcatalog来进行导入。

parquet格式

暂时没有找到导入的方式。

orc格式

sqoop import 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --fields-terminated-by '01' 
 --delete-target-dir 
 --m 1 
 --hcatalog-database test 
 --hcatalog-table user_orc_p 
 --hive-partition-key 'part_dt' 
 --hive-partition-value '20190314'

或者通过如下的方式：

sqoop import 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --fields-terminated-by '01' 
 --delete-target-dir 
 --m 1 
 --hcatalog-database test 
 --hcatalog-table user_orc_p 
 --hcatalog-partition-keys 'part_dt' 
 --hcatalog-partition-values '20190314'

注意： 通过hcatalog-database、hcatalog-table、hive-partition-key、hive-partition-value四个参数导入数据到单个分区。或者通过hcatalog-partition-keys、hcatalog-partition-values参数指定多个分区（通过逗号分隔）

导出

普通表

txt格式

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --export-dir /apps/hive/warehouse/test.db/user_text 
 --input-fields-terminated-by '01'

或者下面的方式：

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --hcatalog-database test 
 --hcatalog-table user_text

注意： export-dir为hive表在hdfs的存储路径。发现使hcatalog-database、hcatalog-table参数也可以。

parquet格式

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --hcatalog-database test 
 --hcatalog-table user_parquet

注意： 通export-dir指定hive表在hdfs的存储路径无法导出，需要通过hcatalog-database、hcatalog-table参数。

orc格式

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --hcatalog-database test 
 --hcatalog-table user_orc

注意： 通export-dir指定hive表在hdfs的存储路径无法导出，需要通过hcatalog-database、hcatalog-table参数。

分区表

txt格式

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --export-dir /apps/hive/warehouse/test.db/user_text_p/part_dt=20190314 
 --input-fields-terminated-by '01'

或者通过如下的方式导出所有分区的数据：

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --hcatalog-database test 
 --hcatalog-table user_text_p

注意： 通过export-dir指定hive表在hdfs的存储路径时需要包含分区目录，只能导出一个分区的数据。通过hcatalog-database、hcatalog-table参数可以导出所有分区的数据。

parquet格式

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --hcatalog-database test 
 --hcatalog-table user_parquet_p

注意： 通过export-dir指定hive表在hdfs的存储路径时包含分区目录也无法导出数据。只能通过hcatalog-database、hcatalog-table参数导出所有分区的数据。此时，不需要指定字段分割符。

orc格式

sqoop export 
 --connect 'jdbc:mysql://10.252.165.54:15025/test?useUnicode=true&characterEncoding=utf-8' 
 --username hdp 
 --password 'hdp!QAZxCDE#' 
 --table user1 
 --hcatalog-database test 
 --hcatalog-table user_orc_p

其他

导出不像导入那么强大，不能指定query、where，但可以通过columns参数限定导出的列。

问题

多字符分割

sqoop不支持多字符分割，如果指定多字符，则会默认按照第一个字符作为分割符。

导入多分区

sqoop导入只支持一个分区，不支持多分区导入。

查看全文

相关阅读:
前端下载图片到本地
 小程序复制文本
 小程序的分享
 vue简单的父子组件之间传值
 git新的远程分支同步到本地
 C#使用RabbitMq队列(Sample,Work,Fanout,Direct等模式的简单使用)
别再眼高手低了! 这些Linq方法都清楚地掌握了吗？
Asp.NetCore 3.1 使用AutoMapper自动映射转换实体 DTO，Data2ViewModel
.NetCore使用Redis，StackExchange.Redis队列，发布与订阅，分布式锁的简单使用
 core的 Linq基本使用，简单模拟数据库多表的左右内连接的测试

原文地址：https://www.cnblogs.com/bener/p/10608439.html