zoukankan      html  css  js  c++  java
  • CDH5.5.6下R、RHive、RJava、RHadoop安装测试

    部署机器
    NameNode1
    NameNode2
    DataNode1
    DataNode2
    DataNode3

    R安装目录
    /usr/local/lib64/R
    RStudio Server安装目录
    /usr/lib/rstudio-server


    R安装步骤
    1.编译前确保安装如下模块,每台机器都要执行
    yum install gcc-gfortran gcc gcc-c++ libXt-devel openssl-devel readline-devel glibc-headers

    2.安装R语言(各个节点都要安装)
    解压
    tar -zxvf R-3.2.0.tar.gz
    编译
    cd R-3.2.0
    ./configure --prefix=/usr/local --disable-nls --enable-R-shlib  #两个选项--disable-nls --enable-R-shlib是为RHive的安装座准备,如果不安装RHive可以省去。
    make
    make install
    其中readline-devel、libXt-devel在编译R的时候需要,而--enable-R-shlib是安装R的共享库,在安装Rstudio需要。

    3.确认Java环境变量
    RHadoop依赖于rJava包,安装rJava前确认已经配置了Java环境变量,然后进行R对jvm建立连接。
    R CMD javareconf

    4.进行rJAVA 、RHive 等模块的安装
    R CMD INSTALL rJava_0.9-6.tar.gz
    R CMD INSTALL Rserve_1.8-3.tar.gz
    R CMD INSTALL RHive_2.0-0.10.tar.gz

    5.配置RHIVE
    新建RHIVE 数据存储路径(本地的非HDFS)
    我这里保存在 /www/store/rhive/data
    mkdir -p /www/store/rhive/data

    新建Rserv.conf 文件并写入 “remote enable” 保存到你指定的目录
    我这里存放在 /www/cloud/R/Rserv.conf
    mkdir -p /www/cloud/R
    vi /www/cloud/R/Rserv.conf

    修改各个节点以及master 的 /etc/profile 新增环境变量
    export RHIVE_DATA=/www/store/rhive/data

    将R目录下的lib目录中所有文件上传至HDFS 中的/rhive/lib 目录下(如果目录不存在手工新建一下即可)
    cd /usr/local/lib64/R/lib
    hadoop fs -put ./* /rhive/lib

    6.启动
    在所有节点和master上执行
    R CMD Rserve --RS-conf /www/cloud/R/Rserv.conf
    telnet NameNode1 6311
    telnet NameNode2 6311
    telnet DataNode1 6311
    telnet DataNode2 6311
    telnet DataNode3 6311

    telnet无法使用执行下面语句安装
    yum install telnet-server 安装telnet服务
    yum install telnet.* 安装telnet客户端

    然后在Master节点telnet所有slave节点,显示 Rsrv0103QAP1 则表示连接成功

    启动hive远程服务: rhive是通过thrift连接hiveserver的,需要要启动后台thrift服务,即:在hive客户端启动hive远程服务,如果已经开启了跳过本步骤
    nohup hive --service hiveserver &

    7.Rhive 测试
    library(RHive)
    rhive.init
    初始化报错未解决
    function (hiveHome = NULL, hiveLib = NULL, hadoopHome = NULL,
    hadoopConf = NULL, hadoopLib = NULL, verbose = FALSE)
    {
    tryCatch({
    .rhive.init(hiveHome = hiveHome, hiveLib = hiveLib, hadoopHome = hadoopHome,
    hadoopConf = hadoopConf, hadoopLib = hadoopLib, verbose = verbose)
    }, error = function(e) {
    .handleErr(e)
    })
    }
    <environment: namespace:RHive>

    rhive.connect(host ="172.16.9.32")
    连接报错未解决
    Warning:
    +----------------------------------------------------------+
    + / hiveServer2 argument has not been provided correctly. +
    + / RHive will use a default value: hiveServer2=TRUE. +
    +----------------------------------------------------------+

    但是读取数据成功了
    d <- rhive.query('select * from src.v_mzdm limit 1000')

    RStudio Server需要设置环境变量
    Sys.setenv("HIVE_HOME"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hive")
    Sys.setenv("HADOOP_HOME"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop")


    8.Rhadoop安装配置按顺序执行有依赖关系
    R CMD INSTALL Rcpp_0.12.17.tar.gz
    R CMD INSTALL plyr_1.8.3.tar.gz
    R CMD INSTALL stringi_1.2.3.tar.gz
    R CMD INSTALL glue_1.2.0.tar.gz
    R CMD INSTALL magrittr_1.5.tar.gz
    R CMD INSTALL stringr_1.3.0.tar.gz
    R CMD INSTALL reshape2_1.4.2.tar.gz
    R CMD INSTALL iterators_1.0.9.tar.gz
    R CMD INSTALL itertools_0.1-1.tar.gz
    R CMD INSTALL digest_0.6.14.tar.gz
    R CMD INSTALL RJSONIO_1.2-0.2.tar.gz
    R CMD INSTALL functional_0.4.tar.gz
    R CMD INSTALL bitops_1.0-5.tar.gz
    R CMD INSTALL caTools_1.17.tar.gz
    R CMD INSTALL Cairo_1.5-10.tar.gz 需要先执行yum -y install cairo* libxt*

    依赖包下载路径https://cran.r-project.org/src/contrib/Archive/

    9.安装RHadoop软件包

    首先将下面的变量加入到环境变量中:

    vi /etc/profile
    export HADOOP_CMD=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop
    export HADOOP_STREAMING=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/jars/hadoop-streaming-2.6.0-cdh5.5.6.jar
    export JAVA_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/lib/native
    source /etc/profile #保存生效
    安装
    R CMD INSTALL rhdfs_1.0.8.tar.gz
    R CMD INSTALL rmr2_3.3.1.tar.gz    #各个节点都要安装
    报错-网上说是rmr2_3.3.1.tar.gz的编译问题未解决
    Copying libs into local build directory
    find: `/usr/lib/hadoop': No such file or directory
    ls: cannot access /opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/hadoop-*-core.jar: No such file or directory
    ls: cannot access /opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/hadoop-core-*.jar: No such file or directory
    Cannot find hadoop-core jar file in hadoop home
    cp: cannot stat `build/dist/*': No such file or directory
    can't build hbase IO classes, skipping
    installing to /usr/local/lib64/R/library/rmr2/libs
    ** R
    ** byte-compile and prepare package for lazy loading
    Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
    there is no package called ‘quickcheck’
    Note: no visible binding for '<<-' assignment to '.Last'
    Note: no visible binding for '<<-' assignment to '.Last'
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
    * DONE (rmr2)

    网上提供解决方法
    http://www.dataguru.cn/thread-135199-1-1.html


    再将native下面的libhadoop.so.0 及 libhadoop.so.1.0.0拷贝到 /usr/lib64下面:
    cp libhadoop.so /usr/lib64/
    cp libhadoop.so.1.0.0 /usr/lib64/

    验证一下rhdfs、rmr2的功能

    测试hdfs
    library(rhdfs)
    hdfs.init()
    hdfs.ls("/")

    rmr2的功能有问题,安装时报错没处理掉


    #R
    export R_HOME=/usr/local/lib64/R
    export HADOOP_CMD=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop
    export HADOOP_STREAMING=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/jars/hadoop-streaming-2.6.0-cdh5.5.6.jar
    export JAVA_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/lib/native
    export RHIVE_DATA=/www/store/rhive/data
    export HIVE_HOME=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hive
    export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop
    export PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME/bin:$R_HOME/bin


    RStudio Server安装步骤
    yum install --nogpgcheck rstudio-server-rhel-1.1.456-x86_64.rpm
    cd /usr/lib/rstudio-server/bin
    ./rstudio-server start
    访问ip:8787

    系统设置
    主要有两个配置文件,默认文件不存在
    /etc/rstudio/rserver.conf
    /etc/rstudio/rsession.conf

    设置端口和ip控制:
    vi /etc/rstudio/rserver.conf
    www-port=8080#监听端口
    www-address=127.0.0.0#允许访问的IP地址,默认0.0.0.0
    重启服务器,生效
    rstudio-server restart

    会话配置管理
    vi /etc/rstudio/rsession.conf
    session-timeout-minutes=30#会话超时时间
    r-cran-repos=http://ftp.ctex.org/mirrors/CRAN#CRAN资源库

    系统管理

    rstudio-server start #启动
    rstudio-server stop #停止
    rstudio-server restart #重启

    查看运行中R进程
    rstudio-server active-sessions
    指定PID,停止运行中的R进程
    rstudio-server suspend-session <pid>
    停止所有运行中的R进程
    rstudio-server suspend-all
    强制停止运行中的R进程,优先级最高,立刻执行
    rstudio-server force-suspend-session <pid>
    rstudio-server force-suspend-all
    RStudio Server临时下线,不允许web访问,并给用户友好提示
    rstudio-server offline
    RStudio Server临时上线
    rstudio-server online

    只可以用普通用户登录
    创建用户和密码
    useradd -d /home/r -m r
    passwd r

    测试
    x <- c(1,2,5,7,9)
    y <- c(2,4,7,8,10)
    library(Cairo)
    CairoPNG(file="pic_plot.png", width=640, height=480)
    plot(x,y)

    RStudio Server中读不到环境变量需要自己设置
    Sys.setenv("HADOOP_CMD"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop")

    读取hdfs上数据
    library(rJava)
    library(rhdfs)
    hdfs.init()
    hdfs.ls("/")
    hdfs.cat("/user/kjxydata/src/V_MZDM/v_mzdm.txt")


    rmr2测试
    1.MapReduce的R语言程序:
    small.ints = to.dfs(1:10)

    mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
    报错-可能是rmr2没安装好

    from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")

    因为MapReduce只能访问HDFS文件系统,先要用to.dfs把数据存储到HDFS文件系统里。MapReduce的运算结果再用from.dfs函数从HDFS文件系统中取出。


    2.rmr的例子是wordcount,对文件中的单词计数
    input<- '/user/kjxydata/src/V_MZDM/v_mzdm.txt'

    wordcount = function(input, output = NULL, pattern = " "){
    wc.map = function(., lines) {
    keyval(unlist( strsplit( x = lines,split = pattern)),1)
    }
    wc.reduce =function(word, counts ) {
    keyval(word, sum(counts))
    }
    mapreduce(input = input ,output = output, input.format = "text",
    map = wc.map, reduce = wc.reduce,combine = T)
    }

    wordcount(input)
    报错-可能是rmr2没安装好

    from.dfs("/tmp/RtmpfZUFEa/file6cac626aa4a7")

    安装参考
    https://www.cnblogs.com/end/archive/2013/02/18/2916105.html
    https://www.cnblogs.com/hunttown/p/5470652.html
    https://www.cnblogs.com/hunttown/p/5470805.html
    https://blog.csdn.net/youngqj/article/details/46819625

  • 相关阅读:
    BZOJ2039: [2009国家集训队]employ人员雇佣
    BZOJ2542: [Ctsc2001]终极情报网
    BZOJ2140: 稳定婚姻
    BZOJ3280: 小R的烦恼
    BZOJ3258: 秘密任务
    BZOJ2400: Spoj 839 Optimal Marks
    BZOJ3171: [Tjoi2013]循环格
    BZOJ1758: [Wc2010]重建计划
    BZOJ3175: [Tjoi2013]攻击装置
    机房破解程序
  • 原文地址:https://www.cnblogs.com/liquan-anran/p/9429376.html
Copyright © 2011-2022 走看看