zoukankan      html  css  js  c++  java
  • pyspark 错误记录

    2020-09-11 06:55:00 ERROR JobScheduler:91 - Error running job streaming job 1599762320000 ms.2
    py4j.Py4JException: Error while obtaining a new communication channel
            at py4j.CallbackClient.getConnectionLock(CallbackClient.java:257)
            at py4j.CallbackClient.sendCommand(CallbackClient.java:377)
            at py4j.CallbackClient.sendCommand(CallbackClient.java:356)
            at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:106)
            at com.sun.proxy.$Proxy17.call(Unknown Source)
            at org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:92)
            at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)
            at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)
            at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
            at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
            at scala.util.Try$.apply(Try.scala:192)
            at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
            at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
            at java.net.Socket.connect(Socket.java:589)
            at java.net.Socket.connect(Socket.java:538)
            at java.net.Socket.<init>(Socket.java:434)
            at java.net.Socket.<init>(Socket.java:244)
            at javax.net.DefaultSocketFactory.createSocket(SocketFactory.java:277)
            at py4j.CallbackConnection.start(CallbackConnection.java:226)
            at py4j.CallbackClient.getConnection(CallbackClient.java:238)
            at py4j.CallbackClient.getConnectionLock(CallbackClient.java:250)
            ... 25 more
    2020-09-11 06:55:00 ERROR PythonDStream$$anon$1:91 - Cannot connect to Python process. It's probably dead. Stopping StreamingContext.
    py4j.Py4JException: Error while obtaining a new communication channel
            at py4j.CallbackClient.getConnectionLock(CallbackClient.java:257)
            at py4j.CallbackClient.sendCommand(CallbackClient.java:377)
            at py4j.CallbackClient.sendCommand(CallbackClient.java:356)
            at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:106)
            at com.sun.proxy.$Proxy17.call(Unknown Source)
            at org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:92)
            at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)
            at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)
            at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
            at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
            at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
            at scala.util.Try$.apply(Try.scala:192)
            at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
            at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
            at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
            at java.net.Socket.connect(Socket.java:589)
            at java.net.Socket.connect(Socket.java:538)
            at java.net.Socket.<init>(Socket.java:434)
            at java.net.Socket.<init>(Socket.java:244)
            at javax.net.DefaultSocketFactory.createSocket(SocketFactory.java:277)
            at py4j.CallbackConnection.start(CallbackConnection.java:226)
            at py4j.CallbackClient.getConnection(CallbackClient.java:238)
            at py4j.CallbackClient.getConnectionLock(CallbackClient.java:250)
            ... 25 more
    
    

    超过2000行的日志文件中主要部分,其它都是在重复显示这些错误日志。
    其中,下面为主要的错误信息。

    ERROR JobScheduler:91 - Error running job streaming job 1599762320000 ms.2
    py4j.Py4JException: Error while obtaining a new communication channel
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
    2020-09-11 06:55:00 ERROR PythonDStream$$anon$1:91 - Cannot connect to Python process. It's probably dead. Stopping StreamingContext.
    py4j.Py4JException: Error while obtaining a new communication channel
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
    

    一、错误分析

    在程序pysoark程序运行几个小时之后,开始显示这个错误。

  • 相关阅读:
    《算法导论》第十章----基本数据结构
    《算法导论》第九章----中位数和顺序统计学
    《算法导论》第八章----线性时间排序(决策树+计数排序+基数排序)
    C++实现快速排序
    C++实现斐波那契第N项非递归与递归实现的时间比较
    C++实现用两个栈实现队列
    C++实现从尾到头打印链表(不改变链表结构)
    C++实现二叉树(建树,前序,中序,后序)递归和非递归实现
    Spark 大数据文本统计
    逻辑回归--参数解释+数据特征不独热编码+训练数据分布可视话
  • 原文地址:https://www.cnblogs.com/leimu/p/13650837.html
Copyright © 2011-2022 走看看