zoukankan      html  css  js  c++  java
  • 在Hadoop Yarn 运行 pyspark 的一些问题

    hduser@master:~$ pyspark --master local[4]
    Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    18/08/16 09:13:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_   version 2.3.1
          /_/
    
    Using Python version 2.7.12 (default, Dec  4 2017 14:50:18)
    SparkSession available as 'spark'.
    >>> sc.master
    u'local[4]'
    >>> textFile=sc.textFile("file:/usr/local/spark/README.md")
    >>> textFile.count()
    103                                                                             
    >>> textFile =sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
    >>> textFile.count()
    1594
    >>> exit()
    hduser@master:~$ HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
    Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    18/08/16 09:16:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/08/16 09:16:10 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_   version 2.3.1
          /_/
    
    Using Python version 2.7.12 (default, Dec  4 2017 14:50:18)
    SparkSession available as 'spark'.
    >>> 18/08/16 09:16:46 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
    18/08/16 09:16:47 ERROR TransportClient: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
    java.nio.channels.ClosedChannelException
    	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
    18/08/16 09:16:47 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
    java.io.IOException: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
    	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.nio.channels.ClosedChannelException
    	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
    18/08/16 09:16:47 ERROR Utils: Uncaught exception in thread Yarn application state monitor
    org.apache.spark.SparkException: Exception thrown in awaitResult: 
    	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:567)
    	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
    	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
    	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1755)
    	at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
    	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
    	at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
    Caused by: java.io.IOException: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
    	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.nio.channels.ClosedChannelException
    	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
    
    >>> sc.master
    u'yarn'
    >>> textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/spark/python/pyspark/context.py", line 546, in textFile
        minPartitions = minPartitions or min(self.defaultParallelism, 2)
      File "/usr/local/spark/python/pyspark/context.py", line 400, in defaultParallelism
        return self._jsc.sc().defaultParallelism()
      File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
      File "/usr/local/spark/python/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o30.defaultParallelism.
    : java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
    This stopped SparkContext was created at:
    
    org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    py4j.Gateway.invoke(Gateway.java:238)
    py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    py4j.GatewayConnection.run(GatewayConnection.java:238)
    java.lang.Thread.run(Thread.java:748)
    
    The currently active SparkContext was created at:
    
    (No active SparkContext.)
             
    	at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:99)
    	at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:2332)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:282)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:238)
    	at java.lang.Thread.run(Thread.java:748)
    
    >>> textFile.count()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'textFile' is not defined
    >>> 
    

      在本地运行pyspark程序查询没问题,但在Hadoop YARN 运行pyspark出现上述问题,希望有关大神看到,指点一下迷津。十分感谢~~~

    下面附上我的yarn-site.xml设置

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8025</value>
    </property>
    <property>
    <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value>
    </property>
    <property>
    <name>yarn.resourcemanager.address</name> <value>master:8050</value>
    </property>
    
    <property>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
    </property>
     
    
     
    </configuration>
    

      

    你只管努力,其他的交给天意~
  • 相关阅读:
    python中将汉字转换成拼音
    关于拉格朗日和内维尔插值算法的python实现
    hdoj1874 (优先队列+Dijkstra)
    hdoj1325 Is It A Tree?
    poj2299 二分思想
    nyoj89 汉诺塔(二)
    nyoj914Yougth的最大化(二分搜索 + 贪心)
    nyoj832 合并游戏(状态压缩DP)
    zoj2432 hdoj1423 最长公共上升子序列(LCIS)
    poj1308 Is It A Tree?(并查集)详解
  • 原文地址:https://www.cnblogs.com/genghenggao/p/9485421.html
Copyright © 2011-2022 走看看