zoukankan      html  css  js  c++  java
  • Zeppelin 0.6.2使用Spark的yarn-client模式

    Zeppelin版本0.6.2


    1. Export SPARK_HOME

    In conf/zeppelin-env.sh, export SPARK_HOME environment variable with your Spark installation path.

    You can optionally export HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS

    export SPARK_HOME=/usr/crh/4.9.2.5-1051/spark
    export HADOOP_CONF_DIR=/etc/hadoop/conf
    export JAVA_HOME=/opt/jdk1.7.0_79

    这儿虽然添加了SPARK_HOME但是后面使用的时候还是找不到包。

    2. Set master in Interpreter menu

    After start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.

    spark解释器设置为yarn-client模式

    FAQ

    1.

    ERROR [2016-07-26 16:46:15,999] ({pool-2-thread-2} Job.java[run]:189) - Job failed
    java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
    	at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:936)
    	at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:70)
    	at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:765)
    	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
    	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
    	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
    	at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
    	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	at java.lang.Thread.run(Thread.java:745)

    Solution

    把SPARK_HOME/lib目录下的所有jar包都拷到zeppelin的lib下。

    2.

    %spark.sql
    show tables

    org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1481857320971_0028":hdfs:hdfs:drwxr-xr-x
    	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
    	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
    	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
    	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
    	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
    	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
    	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)
    	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
    	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905)
    	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048)
    	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
    	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
    	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:415)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
    
    	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
    	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    	at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
    	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
    	at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
    	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3018)
    	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2988)
    	at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1057)
    	at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1053)
    	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1053)
    	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1046)
    	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
    	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:598)
    	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:281)
    	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:634)
    	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:123)
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    	at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
    	at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
    	at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
    	at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
    	at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
    	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
    	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
    	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
    	at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
    	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	at java.lang.Thread.run(Thread.java:745)

    Solution

    hadoop fs -chown root:hdfs /user/root

    3.

    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.{DataFrame, Row, SQLContext}
    import org.apache.spark.{SparkConf, SparkContext}
    import org.apache.spark.ml.feature.RFormula
    import org.apache.spark.ml.regression.LinearRegression
    conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6a79f5df
    sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@59b2aabc
    spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@129d0b9b
    org.apache.spark.sql.AnalysisException: Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;
        at org.apache.spark.sql.catalyst.analysis.Catalog$class.checkTableIdentifier(Catalog.scala:97)
        at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.checkTableIdentifier(Catalog.scala:104)
        at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:134)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:264)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
    
    val dataset = spark.sql("select knife_dish_power,penetration,knife_dish_torque,total_propulsion,knife_dish_speed_readings,propulsion_speed1 from `tbm.tbm_test` where knife_dish_power!=0 and penetration!=0")

    如上sql中给表名和库名添加``。

    然后又报如下错:

    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.{DataFrame, Row, SQLContext}
    import org.apache.spark.{SparkConf, SparkContext}
    import org.apache.spark.ml.feature.RFormula
    import org.apache.spark.ml.regression.LinearRegression
    conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4dd69db0
    sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4072dd9
    spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@238ac654
    java.lang.RuntimeException: Table Not Found: tbm.tbm_test
    	at scala.sys.package$.error(package.scala:27)
    	at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)

    原因:我用的是org.apache.spark.sql.SQLContext对象spark查询hive中的数据,查询hive的数据需要org.apache.spark.sql.hive.HiveContext对象sqlContext或sqlc。

    实例:

     

    顺便记录一下spark-shell使用HiveContext:

    集群环境是HDP2.3.4.0

    spark版本是1.5.2

    spark-shell
    scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    scala> hiveContext.sql("show tables").collect().foreach(println)
    [gps_p1,false]
    scala> hiveContext.sql("select * from g").collect().foreach(println)
    [1,li]                                                                          
    [1,li]
    [1,li]
    [1,li]
    [1,li]

    4.

    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.{DataFrame, Row, SQLContext}
    import org.apache.spark.{SparkConf, SparkContext}
    import org.apache.spark.ml.feature.RFormula
    import org.apache.spark.ml.regression.LinearRegression
    conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4d66e4f8
    org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
    org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
    $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
    $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
    $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
    $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
    $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
    $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
    $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
    $iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
    $iwC$$iwC$$iwC.<init>(<console>:65)
    $iwC$$iwC.<init>(<console>:67)
    $iwC.<init>(<console>:69)
    <init>(<console>:71)
    .<init>(<console>:75)
    .<clinit>(<console>)
    .<init>(<console>:7)
    .<clinit>(<console>)
    $print(<console>)

    Solution:

    val conf = new SparkConf().setAppName("test").set("spark.driver.allowMultipleContexts", "true")
        val sc = new SparkContext(conf)
        val spark = new SQLContext(sc)

    在上面添加set("spark.driver.allowMultipleContexts", "true")。

  • 相关阅读:
    C++(四十)— C++中一个class类对象占用多少内字节
    C++(三十九) — 主函数中增加调试信息
    C++(三十八) — 继承方式、访问控制、构造和析构、虚继承
    ambari部署Hadoop集群(1)
    小波分析和多尺度几何分析
    正则化与矩阵范数
    设计模式之:创建型设计模式
    设计模式六大原则(详细)
    UML类关系(依赖,关联,聚合,组合,泛化,实现)
    SSD详解
  • 原文地址:https://www.cnblogs.com/zeppelin/p/6061887.html
Copyright © 2011-2022 走看看