1.修改拷贝/root/spark-1.5.1-bin-hadoop2.6/conf下面spark-env.sh.template到spark-env.sh,并添加设置HADOOP_CONF_DIR:
# Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # - SPARK_CLASSPATH, default classpath entries to append export HADOOP_CONF_DIR=/etc/hadoop/conf
2.运行测试程序
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue lib/spark-examples*.jar 10
在运行时发现root用户没有hdfs目录/user/的写权限,导致任务失败:
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
修改/user目录的权限即可:
[root@node1 spark-1.5.1-bin-hadoop2.6]# sudo -u hdfs hdfs dfs -chmod 777 /user
重新运行:
[root@node1 spark-1.5.1-bin-hadoop2.6]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue lib/spark-examples*.jar 10 15/11/04 16:38:55 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.0.81:8032 15/11/04 16:38:55 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers 15/11/04 16:38:55 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 15/11/04 16:38:55 INFO yarn.Client: Will allocate AM container, with 4505 MB memory including 409 MB overhead 15/11/04 16:38:55 INFO yarn.Client: Setting up container launch context for our AM 15/11/04 16:38:55 INFO yarn.Client: Setting up the launch environment for our AM container 15/11/04 16:38:55 INFO yarn.Client: Preparing resources for our AM container 15/11/04 16:38:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/04 16:38:56 INFO yarn.Client: Uploading resource file:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-assembly-1.5.1-hadoop2.6.0.jar 15/11/04 16:38:58 INFO yarn.Client: Uploading resource file:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-examples-1.5.1-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-examples-1.5.1-hadoop2.6.0.jar 15/11/04 16:38:59 INFO yarn.Client: Uploading resource file:/tmp/spark-72a1a44a-029c-4523-acd1-6fbd44f5709a/__spark_conf__2902795872463320162.zip -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/__spark_conf__2902795872463320162.zip 15/11/04 16:38:59 INFO spark.SecurityManager: Changing view acls to: root 15/11/04 16:38:59 INFO spark.SecurityManager: Changing modify acls to: root 15/11/04 16:38:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/11/04 16:39:00 INFO yarn.Client: Submitting application 8 to ResourceManager 15/11/04 16:39:00 INFO impl.YarnClientImpl: Submitted application application_1446368481906_0008 15/11/04 16:39:01 INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED) 15/11/04 16:39:01 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.thequeue start time: 1446626340027 final status: UNDEFINED tracking URL: http://node1:8088/proxy/application_1446368481906_0008/ user: root 15/11/04 16:39:02 INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED) 15/11/04 16:39:03 INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED) 15/11/04 16:39:04 INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED) 15/11/04 16:39:05 INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED) 15/11/04 16:39:06 INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING) 15/11/04 16:39:06 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.0.83 ApplicationMaster RPC port: 0 queue: root.thequeue start time: 1446626340027 final status: UNDEFINED tracking URL: http://node1:8088/proxy/application_1446368481906_0008/ user: root 15/11/04 16:39:07 INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING) 15/11/04 16:39:08 INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING) 15/11/04 16:39:09 INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING) 15/11/04 16:39:10 INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING) 15/11/04 16:39:11 INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING) 15/11/04 16:39:12 INFO yarn.Client: Application report for application_1446368481906_0008 (state: FINISHED) 15/11/04 16:39:12 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.0.83 ApplicationMaster RPC port: 0 queue: root.thequeue start time: 1446626340027 final status: SUCCEEDED tracking URL: http://node1:8088/proxy/application_1446368481906_0008/A user: root 15/11/04 16:39:12 INFO util.ShutdownHookManager: Shutdown hook called 15/11/04 16:39:12 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-72a1a44a-029c-4523-acd1-6fbd44f5709a [root@node1 spark-1.5.1-bin-hadoop2.6]#
3.使用spark-sql
将/etc/hive/conf/hive-site.xml拷贝到/root/spark-1.5.1-bin-hadoop2.6/conf下,启动spark-sql即可
[root@node1 spark-1.5.1-bin-hadoop2.6]# cp /etc/hive/conf/hive-site.xml conf/ [root@node1 spark-1.5.1-bin-hadoop2.6]# ./bin/spark-sql