关于sqoop导入数据的时候添加--split-by配置项对sqoop的导入速度的影响。

zoukankan html css js c++ java

关于sqoop导入数据的时候添加--split-by配置项对sqoop的导入速度的影响。
最近在搞sqoop的导入导出操作。但是今天遇到一个表数据量特别大。我们想通过sqoop的导入功能对数据进行导入，但是从oracle当中导入数据的时候，如果是需要平行导入的话必须使用--split-by，也就是设置map的数量。

一种就是不指定--split-by（切分的字段）直接使用一个map的形式就行导入操作。

我这张表的数据是40G，我将其用10个map进行导入，然后按照其中一个number类型的字段进行对数据进行切分。然后导入，导入的脚本如下：
#!/bin/bash url="jdbc:oracle:thin:@172.16.250.10:1521:stupor" database="XD_CORE" tables=("report_residual_money_detail_fields") tables_num=${#tables[@]} username="qry_read" password="****" for((i=0;i<tables_num;i++)); do sqoop import --connect ${url} --username ${username} --password ${password} --query 'SELECT residual_pact_money,loan_id,cur_date FROM XD_CORE.report_residual_money_detail where 1=1 and $CONDITIONS' ---这里是查询的字段 --target-dir /user/gxg/test1 --这个标签一定要指定，这里是导入数据的临时目录 --fields-terminated-by --split-by loan_id ---这个地方是按照某个字段进行切分的，一般都是整型的数据类型进行切分。 -m 10 --这里定义切分的map的个数是10个。 --hive-import --create-hive-table --hive-database test --hive-table ${tables[i]} --null-non-string '\N' --这里是到导入的空值进行处理。 --null-string '\N' --verbose done
下面是我执行真个脚本的时候导入花费的时间。做了一个对比。

这里可以看出，原来是4个map导入数据，后面换成10个map导入数据。这里的导入时间虽然没有减少很多，但是时间还是减少了一些。

具体的原理参考下面的连接，这位老哥说的很不错：
https://blog.csdn.net/weixin_40137479/article/details/79117358
查看全文

相关阅读:
论文阅读 | Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model
论文阅读 | Multi-Relational Script Learning for Discourse Relations
论文阅读 | Do you know that Florence is packed with visitors？Evaluating state-of-the-art models of speaker commitment
论文阅读 | Employing the Correspondence of Relations and Connectives to Identify Implicit Discourse Relations via Label Embeddings
论文阅读 | A Unified Linear-Time Framework for Sentence-Level Discourse Parsing
论文阅读 | Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution
论文阅读 | Using Automatically Extracted Minimum Spans to DisentangleCoreference Evaluation from Boundary Detection
论文阅读 | A Just and Comprehensive Strategy for Using NLP to Address Online Abuse
论文阅读 | What Does BERT Learn about the Structure of Language?
jQuery 包装集选择器

原文地址：https://www.cnblogs.com/gxgd/p/9720705.html