zoukankan      html  css  js  c++  java
  • RHadoop和CDH整合实例(二)- rmr2及RJDBC

    三、 rmr2的安装和测试

           rmr2是map/reduce程序执行的核心部分,需要依赖hadoop-streaming-XXX.jar。rmr2库需要依赖bitopscaTools两个库,而这两个库都不能直接通过Rinstall.packages()获得,需要先去cran上下载再安装,同样的,rmr2可以在https://github.com/RevolutionAnalytics/rmr2通过wget获取。

     

    > cd ~/$INS_TMP
    > wget https://cran.r-project.org/src/contrib/Archive/bitops/bitops_1.0-5.tar.gz 
    > sudo R CMD INSTALL bitops_1.0-5.tar.gz
    > wget https://cran.r-project.org/src/contrib/Archive/caTools/caTools_1.14.tar.gz 
    > sudo R CMD INSTALL caTools_1.14.tar.gz
    > sudo R CMD INSTALL rmr2_3.3.1.tar.gz 

           测试rmr2,进入R:

    library(rmr2)   #加载rmr2库
    small.ints = to.dfs(1:10)    
    mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
    q()    
    

      因为rmr2对数据的访问是经过hdfsto.dfs(1:10)将数据存于hdfs中,下一步的 mapreduce函数才能够直接对small.ints进行处理。若输出结果如下,说明mapreduce执行成功,测试结束。任务的负载等信息可以通过jobtracker(http://XXX:8088/proxy/application_XXX/)查看。mapreduce成功调用的正确输出如下所示。

    可能出现的问题:

          (1) 输入mapreduce()后,看到mapreduce任务成功提交,但是并没有能够成功执行,错误代码exitCode: -1000 due to: Application application_XXXX initialization failed (exitCode=139). 解决方法是换个 kerberos票据,当前票据具有对hdfs的访问权限,但是没有执行mapreduce任务的权限,此时换kinit hive,用hive用户重新执行R程序即可。

     

    四、 RJDBC的安装和测试

    安装rJDBC和以上安装R的其他依赖库方法一样,进入R,然后用install.package()函数即可直接安装。

    sudo R   
    install.packages("RJDBC")    
    q()    
    

      R中通过rJDBC访问hive的测试代码如下,但是需要实现配置好JDBC_DRIVER_PATH路径,以及增加相应的classpath

    #请确保以下路径配置正确
    HIVE_CLASS_PATH <- paste0("/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hive/lib")
    HADOOP_LIB_PATH <- paste0("/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/lib")
    HADOOP_CLASS_PATH <- paste0("/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop")
    MAPRED_CLASS_PATH <- paste0("/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop-mapreduce")
    MAPRED_CORE_PATH <- paste0("/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar")
    
    JDBC_DRIVER_PATH <- paste0("/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hive/lib/hive-jdbc.jar")
    CONNECT_URL <- paste0("jdbc:hive2://bj1-241-centos169:10000/default;principal=hive/bj1-241-centos169@XXX.COM")
    
    library("DBI")
    library("rJava")
    library("RJDBC")
    
    
    hive.class.path <- list.files(path=c(HIVE_CLASS_PATH), pattern="jar", full.names=T)
    hadoop.lib.path <- list.files(path=c(HADOOP_LIB_PATH), pattern="jar", full.names=T)
    hadoop.class.path <- list.files(path=c(HADOOP_CLASS_PATH), pattern="jar", full.names=T)
    mapred.class.path <- list.files(path=c(MAPRED_CLASS_PATH, pattern="jar", full.names=T))
    
    cp = c(hive.class.path, hadoop.lib.path, hadoop.class.path, mapred.class.path, MAPRED_CORE_PATH)
    .jinit(classpath=cp)
    options( java.parameters = "-Xmx8g" )
    
    drv <- JDBC("org.apache.hive.jdbc.HiveDriver", JDBC_DRIVER_PATH)
    conn <- dbConnect(drv, CONNECT_URL, "", "")
    dbListTables(conn)
    dbDisconnect(conn)
    q()
    

     通过RHDBC调用成功hive的成功输出如下所示。

      

  • 相关阅读:
    maven本地添加Oracle包
    tomcat启动时检测到循环继承而栈溢出的问题:Caused by: java.lang.IllegalStateException: Unable to complete the scan for annotations for web application [/test] due to a StackOverflowError. Possible root causes include
    C# LINQ list遍历并组装返回新查询
    windows server 2016下360wifi安装
    Python获取本机多IP并指定出口IP
    python读取excel和读取excel图片总结
    windows2012/2016/2019 iis自带ftp被动端口修改
    Flutter IOS build成功,archive失败
    centos常用操作
    Git相关操作
  • 原文地址:https://www.cnblogs.com/cassie-huang/p/5065257.html
Copyright © 2011-2022 走看看