zoukankan      html  css  js  c++  java
  • 1.2 集群管理

    集群模式

      1、local  使用一个jvm模拟spark集群

      2、standalone  启动master和worker进程

      3、yarn

      4、mesos

    集群管理命令

    [启动]
    start-all.sh
    start-master.sh                     //启动master
    start-slaves.sh                     //启动所有worker
    start-slave.sh spark://s101:7077    //启动单个worker
    
    [停止]
    stop-all.sh
    stop-master.sh
    stop-slaves.sh
    stop-slave.sh

    配置yarn模式:

    1、停止之前的spark集群

    2、启动yarn集群

    3、配置[spark-env.sh]中HADOOP_CONF_DIR环境变量,添加

    export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop

    最终

    #!/usr/bin/env bash
    export JAVA_HOME=/soft/jdk
    
    
    #集群资源配置
    # 每个worker使用的内核数
    export SPARK_WORKER_CORES=2
    #每个worker使用内存数
    export SPARK_WORKER_MEMORY=2g
    #是否可以在一个节点启动几个worker进程
    export SPARK_WORKER_INSTANCES=2
    #master和worker进程本身的内存数
    export SPARK_DAEMON_MEMORY=200m
    
    #配置spark on yarn
    export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop

    4、使用yarn的方式启动shell

    spark-shell --master yarn --num-executors 4 --executor-cores 2 --executor-memory 1g

    在hdfs的/user/centos下生成.sparkstaging并上传了spark的所有类库,将他移动到指定位置并在spark-defaults.conf中配置;这会避免每次spark on yarn都上传spark的资源文件

    4.1 移动资源到指定位置

    hdfs dfs -mv /user/centos/.sparkStaging/application_1539329759117_0001/__spark_libs__7750395513258869587.zip /user/centos/myspark/__spark_libs.zip

    4.2 配置spark-defaults.conf

    spark.yarn.archive hdfs://mycluster/user/centos/data/__spark_libs.zip
    #spark.dynamicAllocation.enabled true
    #spark.shuffle.service.enabled true

    问题:启动上述shell时异常,虚拟内存超限

    18/10/12 15:27:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
    18/10/12 15:27:27 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
    org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
      at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
      at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
      at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
      at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
      at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
      ... 47 elided

    解决:

    配置hadoop中yarn-site.xml,并分发

    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>  //不检查
    </property>

     如下是yarn-site.xml的相关默认配置:

      <property>
        <description>Whether virtual memory limits will be enforced for
        containers.</description>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <description>Ratio between virtual memory to physical memory when
        setting memory limits for containers. Container allocations are
        expressed in terms of physical memory, and virtual memory usage
        is allowed to exceed this allocation by this ratio.
        </description>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
      </property>

     部署测试:

    //cluster
    spark-submit --class b_执行.a --master yarn --deploy-mode cluster hdfs://s101/user/centos/myspark/myspark.jar myspark/wc
    //client
    spark-submit --class b_执行.a --master yarn myspark.jar myspark/wc
    渐变 --> 突变
  • 相关阅读:
    还没解决的问题
    USACO 1.41 The clocks
    USACO Broken Necklace
    hdu 3265 Posters
    USACO1.52 Prime Palindromes
    hdu 3068 && pku 3974 (最长回文串)(Manacher 算法)
    USACO Calf Flac
    USACO Milking Cows
    旧版RTSP协议网页视频无插件直播EasyNVR视频平台为什么无法播放H264编码视频?
    mysql的基本查询
  • 原文地址:https://www.cnblogs.com/lybpy/p/9775009.html
Copyright © 2011-2022 走看看