zoukankan      html  css  js  c++  java
  • Spark 在Hadoop HA下配置HistoryServer问题

    我的Spark机群是部署在Yarn上的,因为之前Yarn的部署只是简单的完全分布式,但是后来升级到HA模式,一个主NN,一个备NN,那么Spark HistoryServer的配置也需要相应的做修改,因为不做修改会报错

    Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:184)
        at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
    Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1719)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1350)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4132)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)

    1 非Hadoop HA下Spark HistoryServer配置

    1.1 配置spark-defalut.conf

    spark.eventLog.enabled             true
    spark.eventLog.dir                 hdfs://1421-0002:9000/spark/sparklogs 
    spark
    .yarn.historyServer.address 1421-0002:18080
    spark
    .serializer org.apache.spark.serializer.KryoSerializer
    spark
    .executor.instances 4

    其中配置了日志文件存储的HDFS的路径,还有Spark history server的地址

    1.2 配置spark-env.sh

    #指定logDirectory,在start-history-server.sh时就无需再显示的指定路径
    export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://1421-0002:9000/spark/sparklogs"
    

    其中指定了日志文件存储的HDFS路径,那么每次启动就不需要加这个参数了

    1.3 启动Spark History Server

    start-history-server.sh 

    2 Hadoop HA下,Spark HistoryServer配置

    2.1 修改spark-env.sh

    #指定logDirectory,在start-history-server.sh时就无需再显示的指定路径
    export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://hadoop-cluster/spark/sparklogs"

    这里将HDFS的路径修改了,因为之前只有一个NN,HA的情况下,指定了两个,所以将1421-0002:9000替换成hadoop-cluster(hadoop/etc/hadoop/hdfs-site.xml 中dfs.nameservices配置的值),不需要指定端口

  • 相关阅读:
    用命令创建MySQL数据库
    Linux下安装mysql
    MySQL字符集及校对规则的理解
    Mybatis 高级结果映射 ResultMap Association Collection
    查看linux系统版本命令
    hdu 1217 Arbitrage (最小生成树)
    hdu 2544 最短路(两点间最短路径)
    hdu 3371 Connect the Cities(最小生成树)
    hdu 1301 Jungle Roads (最小生成树)
    hdu 1875 畅通工程再续(prim方法求得最小生成树)
  • 原文地址:https://www.cnblogs.com/liuchangchun/p/4656273.html
Copyright © 2011-2022 走看看