zoukankan      html  css  js  c++  java
  • spark on yarn

    说明

    这篇文章记录下 spark提交左右在yarn上运行

    hadoop配置

    主要配置yarn-site.xml文件,我们目前使用mapreduce_shuffle,而有些公司也增加了spark_shuffle

    • 只使用mapreduce_shuffle

      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
      
      <property>
        <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
      </property>
      
    • 使用mapreduce_shuffle & spark_shuffle

      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle,spark_shuffle</value>
      </property>
      
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
      
      <property>
        <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
      </property>
      

    当提交hadoop MR 就启用,mapreduce_shuffle,当提交spark作业 就使用spark_shuffle,但个人感觉spark_shuffle 效率一般,shuffle是很大瓶颈,还有 如果你使用spark_shuffle 你需要把spark-yarn_2.10-1.4.1.jar 这个jar copy 到HADOOP_HOME/share/hadoop/lib下 ,否则 hadoop 运行报错 class not find exeception

    spark配置

    $SPARK_HOME/conf/spark-env.sh

    export YARN_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop
    
    export JAVA_HOME=/home/cluster/share/java1.7
    export SCALA_HOME=/home/cluster/share/scala-2.10.5
    export HADOOP_HOME=/home/cluster/apps/hadoop
    export HADOOP_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop
    export SPARK_MASTER_IP=master
    
    export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/native
    export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/cluster/apps/hadoop/share/hadoop/yarn/*:/home/cluster/apps/hadoop/share/hadoop/yarn/lib/*:/home/cluster/apps/hadoop/share/hadoop/common/*:/home/cluster/apps/hadoop/share/hadoop/common/lib/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/lib/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/lib/*:/home/cluster/apps/hadoop/share/hadoop/tools/lib/*:/home/cluster/apps/spark/spark-1.4.1/lib/*
    
    SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://master:8020/var/log/spark"
    

    参数解释:
    YARN_CONF_DIR:指定yarn配置所在路径,如果不增加这行,在提交作业时候增加如下代码:

    export YARN_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop
    

    HADOOP_HOME:指定hadoop 根目录
    HADOOP_CONF_DIR:hadoop配置文件,这个是在spark,如操作hdfs时候读取hadoop配置文件
    SPARK_LIBRARY_PATH:告诉spark读取本地的.so文件
    SPARK_CLASSPATH:spark加载各种需要的jar包
    SPARK_HISTORY_OPTS:配置启动spark history 服务

    前置条件

    如果操作hdfs,需要启动namenode&datanode
    还有yarn服务器,resourcemanger&nodemanager

     /home/cluster/apps$ jps
    29368 MainGenericRunner
    29510 Jps
    22885 Main
    29210 NodeManager
    28952 NameNode
    29158 ResourceManager
    29023 DataNode
    

    提交作业

    1. PI:
    • yarn-cluster模式:

      /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-cluster --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.SparkPi /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar  10
      
    • yarn-client模式:

      /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-client --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.SparkPi /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar  10
      
    1. wordcount:
    • yarn-cluster模式:

      /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-cluster --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.JavaWordCount /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar /data/hadoop/wordcount/
      
    • yarn-client模式:

      /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-client --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.JavaWordCount /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar /data/hadoop/wordcount/
      

    结果截图

    这里写图片描述
    这四条记录从下往上看,分别是PI:yarn-cluster模式,PI:yarn-client模式,wordcount:yarn-cluster模式,wordcount:yarn-client模式

    尊重原创,拒绝转载
    http://blog.csdn.net/stark_summer/article/details/48661317

  • 相关阅读:
    CSS 备忘
    header操作cookie
    定时器传参数
    Display 和Visible 区别
    php 笔记
    概要设计要求
    iOS 之 UITextView
    iOS 按钮设置图片和事件
    iOS 设置控件圆角、文字、字体
    iOS 之 UIScrollView
  • 原文地址:https://www.cnblogs.com/stark-summer/p/4830472.html
Copyright © 2011-2022 走看看