zoukankan      html  css  js  c++  java
  • spark cdh5编译安装[spark-1.0.2 hadoop2.3.0 cdh5.1.0]

    前提你得安装有Hadoop 我的版本hadoop2.3-cdh5.1.0

    1、下载maven包

    2、配置M2_HOME环境变量,配置maven 的bin目录到path路径

    3、export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

    4、到官方下载spark-1.0.2.gz压缩包、解压

    5、进入spark解压包目录

    6、执行./make-distribution.sh --hadoop 2.3.0-cdh5.1.0 --with-yarn --tgz

    7、漫长的等待

    8、完成后会在当前目录下生成spark-1.0.2-bin-2.3.0-cdh5.1.0.tgz

    9、复制到安装目录解压

    10、配置conf下的配置文件

    cp spark-env.sh.template spark-env.sh

    vim spark-env.sh

    配置参数:对应即可

    export JAVA_HOME=/home/hadoop/jdk
    export HADOOP_HOME=/home/hadoop/hadoop-2.3.0-cdh5.1.0
    export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
    export SPARK_YARN_APP_NAME=spark-on-yarn
    export SPARK_EXECUTOR_INSTANCES=1
    export SPARK_EXECUTOR_CORES=2
    export SPARK_EXECUTOR_MEMORY=3500m
    export SPARK_DRIVER_MEMORY=3500m
    export SPARK_MASTER_IP=master
    export SPARK_MASTER_PORT=7077
    export SPARK_WORKER_CORES=2
    export SPARK_WORKER_MEMORY=3500m
    export SPARK_WORKER_INSTANCES=1

    11、配置slaves

    slave01
    slave02
    slave03
    slave04
    slave05

    12、分发

    拷贝spark安装目录到各个slave节点

    13、启动

    sbin/start-all.sh

    14、运行实例

    $SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-client     --num-executors 3     --driver-memory 4g     --executor-memory 2g     --executor-cores 1     /home/hadoop/spark/lib/spark-examples-1.0.2-hadoop2.3.0-cdh5.1.0.jar     100

    15、发送实例竟然没成功

    在yarn监控界面点击日志出现一堆这些错误

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).

    16、解决问题

    将spark目录下lib包的spark核心包拿到本地,发现里面有一个yarn-defaul.xml文件,打开发现

      <!-- Resource Manager Configs -->
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>0.0.0.0</value>
      </property> 

    可想而知,到本地找resorcemanager,如果运行节点不是在yarn节点的resourcemanager上运行,怎么可能找到呢

    17、修改这个配置如下

      <!-- Resource Manager Configs -->
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
      </property> 

    18、打包重新分发spark到各个节点

  • 相关阅读:
    Jenkins 基础篇
    Jenkins 基础篇
    Windows配置Nodejs环境
    Windows配置安装JDK
    Windows安装MySQL
    Ubuntu安装MySQL
    利用中国移动合彩云实现360云盘迁移到百度云
    Linux Shell下的后台运行及其前台的转换
    nova image-list 和 glance image-list 有什么区别
    启动虚拟机时提示我已移动或我已复制选项的详解
  • 原文地址:https://www.cnblogs.com/ningbj/p/3939888.html
Copyright © 2011-2022 走看看