zoukankan      html  css  js  c++  java
  • 使用sqoop从Oracle或mysql抽取数据到HDFS遇到的报错及解决

     

    一、参考文档:

    1、https://www.rittmanmead.com/blog/2014/03/using-sqoop-for-loading-oracle-data-into-hadoop-on-the-bigdatalite-vm/

    2、http://www.cnblogs.com/bjtu-leefon/archive/2013/06/28/3160549.html

    二、使用脚本 
    ----sqoop import zdsd


    使用注意:


    1、执行节点 Sqoop_home/lib 里添加目标数据库的jar
    2、-m 表示并发启动map的数量 -m 1表示启动一个map, 指定-m > 1时,必须添加 --split-by 指定分割列,
    ,分割列要求至少满足两个条件中任一个:1)目标表有主键   2)有num 类型或者date类型,因为会执行 min(split_column),max(split_column)操作,决定如何分割
    否则无法启用多个map
    3、指定map数量不应过大,不然会增加数据源的压力


    4、执行复杂sql 需要使用 --query 参数


    sqoop import --connect jdbc:oracle:thin:@10.**.**.**:1521/jwy --username ** --password ** --table ZHZY.ZDSF --target-dir '/hawq_external/zdsd_sqoop_test1' -m 1


    ----kkxx   -m > 1 必须有主键,,,如果没有主键则必须指定split-by 分布列,,,,split-by  
    ---Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter 会求最大值,最小值,,,必须不能是text类型
    sqoop import --connect jdbc:oracle:thin:@10.**.**.**:1521/jwy --username ** --password ** --table ZHZY.CLD_TFC_PASS171201171215 --target-dir '/hawq_external/hg_sqoop_kkxx' -m 20 --split-by ZHKRKSJ


    -ZHZY.B_XDRY


    sqoop import --connect jdbc:oracle:thin:@10.**.**.**:1521/jwy --username ** --password ** --table table_name --target-dir '/hawq_external/hg_xdry_sqoop_test1' -m 1


    ---ZHZY.V_QS_JJ_KKXX 视图


    sqoop import --connect jdbc:oracle:thin:@10.**.**.**:1521/jwy --username ** --password ** --table ZHZY.V_QS_JJ_KKXX --target-dir '/hawq_external/hg_v_kkxx_sqoop_test' -m 10






    三、使用过程中的报错


    1、在Kettle中用自带作业中的sqoop import
     
    2017/12/15 15:16:23 - Spoon - 正在开始任务...
    2017/12/15 15:16:23 - sqoop_kk_zhugandao - 开始执行任务
    2017/12/15 15:16:23 - sqoop_kk_zhugandao - 开始项[Sqoop Import]
    2017/12/15 15:16:23 - Sqoop Import - 2017/12/15 15:16:23 - fs.default.name is deprecated. Instead, use fs.defaultFS
    2017/12/15 15:16:23 - Sqoop Import - 2017/12/15 15:16:23 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
    2017/12/15 15:16:23 - Sqoop Import - 2017/12/15 15:16:23 - Running Sqoop version: 1.4.6.2.5.3.0-37
    2017/12/15 15:16:23 - Sqoop Import - 2017/12/15 15:16:23 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
    2017/12/15 15:16:23 - Sqoop Import - 2017/12/15 15:16:23 - Data Connector for Oracle and Hadoop is disabled.
    2017/12/15 15:16:23 - Sqoop Import - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Error running Sqoop
    2017/12/15 15:16:23 - Sqoop Import - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:66)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:282)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:89)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:610)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    2017/12/15 15:16:23 - Sqoop Import -  at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.hadoop.shim.common.ClassPathModifyingSqoopShim$1.call(ClassPathModifyingSqoopShim.java:81)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.hadoop.shim.common.ClassPathModifyingSqoopShim$1.call(ClassPathModifyingSqoopShim.java:1)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.hadoop.shim.common.ClassPathModifyingSqoopShim.runWithModifiedClassPathProperty(ClassPathModifyingSqoopShim.java:62)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.hadoop.shim.common.ClassPathModifyingSqoopShim.runTool(ClassPathModifyingSqoopShim.java:75)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.hadoop.shim.common.delegating.DelegatingSqoopShim.runTool(DelegatingSqoopShim.java:41)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.big.data.impl.shim.sqoop.SqoopServiceImpl.runTool(SqoopServiceImpl.java:62)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.big.data.kettle.plugins.sqoop.AbstractSqoopJobEntry.executeSqoop(AbstractSqoopJobEntry.java:302)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.big.data.kettle.plugins.sqoop.AbstractSqoopJobEntry$1.run(AbstractSqoopJobEntry.java:273)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.Thread.run(Thread.java:745)
    2017/12/15 15:16:23 - Sqoop Import - Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
    2017/12/15 15:16:23 - Sqoop Import -  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.di.core.plugins.KettleURLClassLoader.loadClassFromParent(KettleURLClassLoader.java:89)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.di.core.plugins.KettleURLClassLoader.loadClass(KettleURLClassLoader.java:108)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.Class.forName0(Native Method)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.Class.forName(Class.java:348)
    2017/12/15 15:16:23 - Sqoop Import -  at org.pentaho.hadoop.shim.HadoopConfigurationClassLoader.loadClass(HadoopConfigurationClassLoader.java:99)
    2017/12/15 15:16:23 - Sqoop Import -  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    2017/12/15 15:16:23 - Sqoop Import -  ... 18 more
    2017/12/15 15:16:23 - sqoop_kk_zhugandao - 完成作业项[Sqoop Import] (结果=[false])
    2017/12/15 15:16:23 - sqoop_kk_zhugandao - 任务执行完毕
    2017/12/15 15:16:23 - Spoon - 任务已经结束.
      
    2、

    1.原表没有设置主键,出现错误提示:

    ERROR tool.ImportTool: Error during import: No primary key could be found for table xxx. Please specify one with --split-by or perform a sequential import with '-m 1'

    提示说明的很清楚:在表xxx没有发现主键,使用--split-by指定一个column作为拆分字段或者在命令行上添加 ‘-m 1',为什么会出现这样的错误提示,我们需要了解一下Sqoop的并行导入机制:

    一般来说,Sqoop会创建4个进程,同时进行数据导入操作

    如果要导入表的主键为id,并行的数量为4,那么Sqoop首先会执行如下一个查询:

    select max(id) as max, select min(id) as min from table [where 如果指定了where子句];

    通过这个查询,获取到需要拆分字段(id)的最大值和最小值,假设分别是1和1000。

    然后,Sqoop会根据需要并行导入的数量,进行拆分查询,比如上面的这个例子,并行导入将拆分为如下4条SQL同时执行:

    select * from table where 0 <= id < 250;

    select * from table where 250 <= id < 500;

    select * from table where 500 <= id < 750;

    select * from table where 750 <= id < 1000;

    注意,这个拆分的字段需要是整数。

    如果要导入的表中没有主键,则我们应该手动选取一个合适的拆分字段。

    首先查看表中有那些字段,如查看表student:desc student;  

    表中有id,name两个字段,那我们就可以选取id作为拆分字段,将表导入hive时在命令中添加 --split-by id,就不会报错了。

    参考:http://www.cnblogs.com/gpcuster/archive/2011/03/01/1968027.html

    3.Sqoop Hive exited with status 1

    当从mysql向Hive导入数据时,执行:

     sqoop import --connect jdbc:mysql://localhost/hive --username hive --password hive --table dept_InnoDB --hive-table dept_InnoDB --hive-import --split-by deptno

    出现以下错误:

    13/06/27 18:35:05 INFO hive.HiveImport: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
    13/06/27 18:35:10 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 1

    google之,原来是机器上装的hive和hbase的版本不兼容造成的,在这里具体的说是hive和habse所使用的thrift版本不一样。当hive和hbase的jar包都添加到CLASSPATH时,运行Sqoop时只会激活一个版本的thrift,这样往往导致hive运行出错。

    执行:

    locate *thrift*.jar

    看到:

    果然,hive和hbase引用了不同版本的thrift.

    这个问题解决起来也非常简单,将HBASE_HOME设置为空,让Sqoop不能加载hbase版本的thrift就OK了。


    文档还待完善。

  • 相关阅读:
    faster with MyISAM tables than with InnoDB or NDB tables
    w-BIG TABLE 1-toSMALLtable @-toMEMORY
    Indexing and Hashing
    MEMORY Storage Engine MEMORY Tables TEMPORARY TABLE max_heap_table_size
    controlling the variance of request response times and not just worrying about maximizing queries per second
    Variance
    Population Mean
    12.162s 1805.867s
    situations where MyISAM will be faster than InnoDB
    1920.154s 0.309s 30817
  • 原文地址:https://www.cnblogs.com/pejsidney/p/8953300.html
Copyright © 2011-2022 走看看