近日,在使用Cascade R-CNN完成目标检测任务时,我在使用这个模型训练自己的数据集时出现了如下错误:
具体如以下截图所示:
详细错误如下所示:
Traceback (most recent call last): File "train.py", line 195, in <module> train() File "train.py", line 175, in train _, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op]) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: attempt to get argmax of an empty sequence Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__ ret = func(*args) File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 49, in anchor_target_layer argmax_overlaps = overlaps.argmax(axis=1) ValueError: attempt to get argmax of an empty sequence [[node sample_anchors_minibatch/PyFunc (defined at ../libs/networks/build_whole_network.py:433) = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Cast/_1175, postprocess_RPN/Shape_2, make_anchors_forRPN/concat/_1177)]] [[{{node sample_RCNN_minibatch_stage2/Shape_1/_1383}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3164_sample_RCNN_minibatch_stage2/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] Caused by op 'sample_anchors_minibatch/PyFunc', defined at: File "train.py", line 195, in <module> train() File "train.py", line 46, in train gtboxes_batch=gtboxes_and_label) File "../libs/networks/build_whole_network.py", line 433, in build_whole_detection_network [tf.float32, tf.float32]) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 457, in py_func func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 281, in _internal_py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 129, in py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): ValueError: attempt to get argmax of an empty sequence Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__ ret = func(*args) File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 49, in anchor_target_layer argmax_overlaps = overlaps.argmax(axis=1) ValueError: attempt to get argmax of an empty sequence [[node sample_anchors_minibatch/PyFunc (defined at ../libs/networks/build_whole_network.py:433) = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Cast/_1175, postprocess_RPN/Shape_2, make_anchors_forRPN/concat/_1177)]] [[{{node sample_RCNN_minibatch_stage2/Shape_1/_1383}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3164_sample_RCNN_minibatch_stage2/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
我使用的教程是这个链接:cascade r-cnn训练和测试(tensorflow框架)
这个错误在以前也遇到过,当时的解决方案是通过try except 把发生错误的数据给pass掉。
然后在这次训练的过程中,又遇到了这个错误,这次的错误已经没有办法给pass掉了,因为这个错误会直接导致程序运行中断。
错误原因:空标注文件导致出现这个错误,在检查自己的标注文件过程中,偶然发现竟然存在如下所示的标注文件
<annotation> <folder>********</folder> <filename>**********</filename> <path>******************</path> <source> <database>Unknown</database> </source> <size> <width>219</width> <height>167</height> <depth>3</depth> </size> <segmented>0</segmented> </annotation>
在这个标注的xml文件里面是没有目标检测框的坐标的,而这也是导致出现这个错误的主要原因。
错误解决:
有可能在制作数据集的过程中,某些地方导致xml文件里面的坐标丢失,解决办法有两种,一种是删除掉空坐标的xml文件如果这种类型的xml文件数量较少的情况下,第二种就是检查xml文件然后把丢失的坐标点给添加到xml文件中去。
总结:
这个错误的解决方案也不一定和我一样,这里的提出只是当作一种参考,可能导致错误的原因多种多样,但是如果后面还是出现了这种错误,一定要仔细检查一下数据集。如果后面出现了新的解决方案,我会更新这篇博客的。