zoukankan      html  css  js  c++  java
  • spark on yarn任务提交缓慢解决

    spark on yarn任务提交缓慢解决

    spark版本:spark-2.0.0 hadoop 2.7.2。

    在spark on yarn 模式执行任务提交,发现特别慢,要等待几分钟,

    使用集群模式模式提交任务:
    ./bin/spark-submit --class org.apache.spark.examples.SparkPi
    --master yarn
    --deploy-mode cluster
    --driver-memory 4g
    --executor-memory 2g
    --executor-cores 1
    --queue thequeue
    examples/jars/spark-examples*.jar
    10

    发现报出如下警告信息:

    17/02/08 18:26:23 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    17/02/08 18:26:29 INFO yarn.Client: Uploading resource file:/tmp/spark-91508860-fdda-4203-b733-e19625ef23a0/__spark_libs__4918922933506017904.zip -> hdfs://dbmtimehadoop/user/fuxin.zhao/.sparkStaging/application_1486451708427_0392/__spark_libs__4918922933506017904.zip
    
    

    这个日志之后在上传程序依赖的jar,大概要耗时30s左右,造成任务提交速度超鸡慢,在官网上查到有关的解决办法:

    To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars. 
    For details please refer to Spark Properties. If neither spark.yarn.archive nor spark.yarn.jars is specified, 
    Spark will create a zip file with all jars under $SPARK_HOME/jars and upload it to the distributed cache.
    

    大意是:如果想要在yarn端(yarn的节点)访问spark的runtime jars,需要指定spark.yarn.archive 或者 spark.yarn.jars。如果都这两个参数都没有指定,spark就会把$SPARK_HOME/jars/所有的jar上传到分布式缓存中。这也是之前任务提交特别慢的原因。

    下面是解决方案:
    将$SPARK_HOME/jars/* 下spark运行依赖的jar上传到hdfs上。

    hadoop fs -mkdir hdfs://dbmtimehadoop/tmp/spark/lib_jars/
    hadoop fs -put  $SPARK_HOME/jars/* hdfs://dbmtimehadoop/tmp/spark/lib_jars/
    

    vi $SPARK_HOME/conf/spark-defaults.conf
    添加如下内容:
    spark.yarn.jars hdfs://dbmtimehadoop/tmp/spark/lib_jars/

    再执行任务提交,发现报出如下异常:

    Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
    	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
    	at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    
    

    查看ResourceManager的日志的异常:http://db-namenode01.host-mtime.com:19888/jobhistory/logs/db-datanode03.host-mtime.com:34545/container_e08_1486451708427_0346_02_000001/

    Log Length: 191
    
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
    Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
    

    说明之前的配置有误,spark相关的jar包没有加载成功,尝试了一下,如下几种配置方法是有效的:

    #生效
    spark.yarn.jars                  hdfs://dbmtimehadoop/tmp/spark/lib_jars/*.jar ##生效
    #spark.yarn.jars                  hdfs://dbmtimehadoop/tmp/spark/lib_jars/*   ##生效
    ##直接配置多个以逗号分隔的jar,也可以生效。
    #spark.yarn.jars                 hdfs://dbmtimehadoop/tmp/spark/lib_jars/activation-1.1.1.jar,hdfs://dbmtimehadoop/tmp/spark/lib_jars/antlr-2.7.7.jar,hdfs://dbmtimehadoop/tmp/spark/lib_jars/antlr4-runtime-4.5.3.jar,hdfs://dbmtimehadoop/tmp/spark/lib_jars/antlr-runtime-3.4.jar
                                                                   
    

    再重新提交任务,执行成功。
    出现如下信息说明jar添加成功。

    17/02/08 19:28:21 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://dbmtimehadoop/tmp/spark/lib_jars/spark-mllib-local_2.11-2.0.0.jar
    17/02/08 19:28:21 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://dbmtimehadoop/tmp/spark/lib_jars/spark-mllib_2.11-2.0.0.jar
    17/02/08 19:28:21 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://dbmtimehadoop/tmp/spark/lib_jars/spark-network-common_2.11-2.0.0.jar
    17/02/08 19:28:21 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://dbmtimehadoop/tmp/spark/lib_jars/spark-network-shuffle_2.11-2.0.0.jar
    
    
  • 相关阅读:
    UITableViewCell出现动画
    Block简单使用
    Storyboard可视化编程详解
    布局案例
    WebStorm-快捷键
    盒模型布局相关-基础与语法
    多线程编程-003-NSOPeration
    linux安装redis 和 使用
    mui iOS云打包修改权限提示语
    vue 合成图片
  • 原文地址:https://www.cnblogs.com/honeybee/p/6379599.html
Copyright © 2011-2022 走看看