zoukankan html css js c++ java

spark运行方式及其常用参数

yarn cluster模式

例行任务一般会采用这种方式运行

指定固定的executor数

作业常用的参数都在其中指定了，后面的运行脚本会省略

spark-submit 
    --master yarn-cluster   
    --deploy-mode cluster                   #集群运行模式
    --name wordcount_${date}                #作业名
    --queue production.group.yanghao        #指定队列
    --conf spark.default.parallelism=1000   #并行度，shuffle后的默认partition数 
    --conf spark.network.timeout=1800s 
    --conf spark.yarn.executor.memoryOverhead=1024    #堆外内存
    --conf spark.scheduler.executorTaskBlacklistTime=30000 
    --conf spark.core.connection.ack.wait.timeout=300s 
    --num-executors 200                    #executor数目 
    --executor-memory 4G                   #executor中堆的内存
    --executor-cores 2                     #executor执行core的数目，设置大于1   
    --driver-memory 2G                     #driver内存，不用过大   
    --class ${main_class}                  #主类
    ${jar_path}                            #jar包位置
    param_list                             #mainClass接收的参数列表

动态调整executor数目

spark-submit 
    --master yarn-cluster 
    --deploy-mode cluster 
    --name wordcount_${date} 
    --queue production.group.yanghao 
    --conf spark.dynamicAllocation.enabled=true      #开启动态分配
    --conf spark.shuffle.service.enabled=true        #shuffle service，可以保证executor被删除时，shuffle file被保留
    --conf spark.dynamicAllocation.minExecutors=200  #最小的executor数目
    --conf spark.dynamicAllocation.maxExecutors=500  #最大的executor数目
    --class ${main_class} 
    ${jar_path} 
    param_list

yarn client模式

spark-shell 
    --master yarn-client     
    --queue production.group.yanghao       #指定队列
    --num-executors 200                    #executor数目 
    --executor-memory 4G                   #executor中堆的内存
    --executor-cores 2                     #executor执行core的数目，设置大于1   
    --driver-memory 2G                     #driver内存，不用过大   
    --jars ${jar_path}                      #jar包位置

yarn cluster模式 vs yarn client模式

yarn cluster模式：spark driver和application master在同一个节点上
yarn client模式：spark driver和client在同一个节点上，支持shell

查看全文

相关阅读:
dev gridcontrol设置复选框列，和按数据选择行
 Django——三种方式上传文件/数据 (form ajax json)
Django——ajax简单使用
 Django——ajax介绍,django内置序列化器
 阿里云oss 上传文件的两种方式(本地路径上传远程链接上传)
easywechat 网页授权登录
 19。删除链表倒数第N个节点
 142环形链表II
141环形链表
 701二叉搜索树中的插入操作

原文地址：https://www.cnblogs.com/xzjf/p/10944275.html