zoukankan      html  css  js  c++  java
  • spark.yarn.jar和spark.yarn.archive的使用

    启动Spark任务时,在没有配置spark.yarn.archive或者spark.yarn.jars时, 会看到不停地上传jar非常耗时;使用spark.yarn.archive可以大大地减少任务的启动时间,整个处理过程如下

    1.在本地创建zip文件

    hzlishuming@hadoop691:~/env/spark$ cd jars/
    hzlishuming@hadoop691:~/env/spark/jars$ zip spark2.1.1-hadoop2.7.3.zip ./*

    2.上传至HDFS并更改权限

    hzlishuming@hadoop691:~/env/spark$ hdfs dfs -mkdir /tmp/spark-archive
    hzlishuming@hadoop691:~/env/spark$ hdfs dfs -put ./spark2.1.1-hadoop2.7.3.zip /tmp/spark-archive
    hzlishuming@hadoop691:~/env/spark$ hdfs dfs -chmod 775 /tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip

    3.配置spark-defaut.conf

      hdfs:///tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip

    可以参考日志如下:

    17/08/10 14:59:27 INFO Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
    17/08/10 14:59:27 INFO Client: Uploading resource file:/etc/security/keytabs/hive.service.keytab -> hdfs://hz-test-01/user/hive/.sparkStaging/application_1500533600435_2825/hive.service.keytab
    17/08/10 14:59:27 INFO Client: Source and destination file systems are the same. Not copying hdfs:/tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip
    17/08/10 14:59:27 INFO Client: Uploading resource file:/home/hzlishuming/env/spark-2.1.1/local/spark-6606333c-1e5b-462c-ad39-aaf75251c246/__spark_conf__2962372142699552959.zip -> hdfs://hz-test-01/user/hive/.sparkStaging/application_1500533600435_2825/__spark_conf__.zip
  • 相关阅读:
    C++类中使用new及delete小例子(续)
    C++类中使用new及delete小例子
    C++类中static修饰的函数的使用
    C++类使用static小例子(新手学习C++)
    C++结构体中使用函数与类中使用函数小结
    记一次简单的性能优化
    [转载]Java的内存回收机制
    写给自己的项目总结
    [转载]正则表达式30分钟入门教程
    使用JRockit进行性能优化一:环境搭建
  • 原文地址:https://www.cnblogs.com/itboys/p/10041463.html
Copyright © 2011-2022 走看看