事情经过:之前该topic(M_A)已经存在,而且正常使用structured streaming消费了一段时间,后来删除了topic(M_A),重新创建了topic(M-A),程序使用新创建的topic(M-A)进行实时统计操作,使用structured streaming执行过程中抛出了一下异常:
18/08/24 10:20:42 INFO utils.AppInfoParser: Kafka version : 0.10.0-kafka-2.1.0 18/08/24 10:20:42 INFO utils.AppInfoParser: Kafka commitId : unknown 18/08/24 10:20:42 INFO internals.AbstractCoordinator: Discovered coordinator vmxx.xx.xx.xx.com.cn:9092 (id: 2147483417 rack: null) for group spark-kafka-source-165bc430-5cbc-4cfc-8327-9af01fd02fcc-616947503-driver-0. 18/08/24 10:20:42 INFO internals.ConsumerCoordinator: Revoking previously assigned partitions [] for group spark-kafka-source-165bc430-5cbc-4cfc-8327-9af01fd02fcc-616947503-driver-0 18/08/24 10:20:42 INFO internals.AbstractCoordinator: (Re-)joining group spark-kafka-source-165bc430-5cbc-4cfc-8327-9af01fd02fcc-616947503-driver-0 18/08/24 10:20:45 INFO internals.AbstractCoordinator: Successfully joined group spark-kafka-source-165bc430-5cbc-4cfc-8327-9af01fd02fcc-616947503-driver-0 with generation 1 18/08/24 10:20:45 INFO internals.ConsumerCoordinator: Setting newly assigned partitions [M-A-0] for group spark-kafka-source-165bc430-5cbc-4cfc-8327-9af01fd02fcc-616947503-driver-0 18/08/24 10:20:46 WARN kafka010.KafkaSource: Set(M_A-0) are gone. Some data may have been missed. Some data may have been lost because they are not available in Kafka any more; either the data was aged out by Kafka or the topic may have been deleted before all the data in the topic was processed. If you want your streaming query to fail on such cases, set the source option "failOnDataLoss" to "true".
错误原因,在structured streaming编程时,使用checkpoint(checkpointt中添加topicname.replace("-","").replace("_","")),此时忘记了删除checkpoint,因此导致操作。