zoukankan      html  css  js  c++  java
  • 目标检测算法-YOLO-V1训练代码详解

    YOLO-V1网络结构由24个卷积层与2个全连接层构成,网络入口为448×448×3,输出维度:S×S×(B×5+C),S为划分网格数,B为每个网格负责目标个数,C为类别个数。

    YOLO-V1是将一副图像分成S×S个网格,如果某个object的中心落在这个网格中,则这个网格就负责预测这个object,每个网格要预测B个bounding box,每个bounding box要预测一个confidence值,这个confidence代表了所预测的bounding box中含有object的置信度和这个bounding box预测的有多准这两个重要信息。

    Pr(Object)IoUpredtruth

    如果有object落在一个网格中,公式第一项取1,否则取0,第二项是bounding box和真实框的IOU的值(confidence针对每个bounding box,框中有没有网格包含object中心点。YOLO-V1中每个网格有两个bounding box,对于每个bounding box有5个预测值,x,y,w,h,confidence,每一个网格还要预测C条件类别的概率,即在一个网格包含一个object的前提下,它属于某个类别的概率。(x,y)表示bounding box相对于网格单元的边界的offset,归一化到(0,1)范围之内,而w,h表示相对于整个图片的预测宽和高,也被归一化到(0,1)范围内。c代表的是object在某个bounding box的confidence。confidence计算如下:

     Pr(ClassiObject)Pr(Object)IoUpredtruth=Pr(Classi)IoUpredtruth

    下面说明如何将预测坐标的x,y用相对于对应网格的offset归一化到0-1和w,h是如何利用图像的宽高归一化到0-1之间。每个单元格预测的B个(x,y,w,h,confidence)向量,假设图片为S×S个网格,S=7,图片宽为w​i高为hi 。

    下面引用一张我看过的感觉讲解很详细的一张图片:

    1.(x,y)是bbox的中心相对于单元格的offset对应于上图中的蓝色单元格,坐标为(xcol=1,yrow=4),加射它的预测输出是红色框bbox,设bbox的中心坐标为(xc,yc),那么最终预测出来的(x,y)是经过归一化处理的,表示的是相对于单元格的offset,公式为:x=wi​ / x∗ Sxcoly=hi / y∗ Syrow

     2.(w,h)是bbox相对于整个图片的比例预测的bbox的宽高为wb,hb,(w,h)表示的是bbox相对于整张图片的占比,公式为:w=wi​ / wb,h=hi / hb

    YOLO-V1中需要的参数


     1 def __init__(self):
     2     self.classes = ["aeroplane", "bicycle", "bird", "boat", "bottle",
     3                     "bus", "car", "cat", "chair", "cow", "diningtable",
     4                     "dog", "horse", "motorbike", "person", "pottedplant",
     5                     "sheep", "sofa", "train", "tvmonitor"]
     6     #计算坐标用的
     7     self.x_offset = np.transpose(np.reshape(np.array([np.arange(7)] * 7 * 2, dtype=np.float32), [2, 7, 7]), [1, 2, 0])
     8     self.y_offset = np.transpose(self.x_offset, [1, 0, 2])
     9     #输入图片大小
    10     self.img_size = (448, 448)
    11     #阈值
    12     self.iou_threshold = 0.5
    13     self.batch_size = 45
    14     #计算loss需要的参数
    15     self.class_scale = 2.0
    16     self.object_scale = 1.0
    17     self.noobject_scale = 1.0
    18     self.coord_scale = 5.0

    网络部分开始


     1 def _build_net(self):
     2     x = tf.placeholder(tf.float32, [None, 448, 448, 3])
     3     with tf.variable_scope('yolo'):
     4         net = self.conv_layer(x, 64, 7, 2, 'conv_2')
     5         net = self.max_pool_layer(net, 2, 2)
     6         net = self.conv_layer(net, 192, 3, 1, 'conv_4')
     7         net = self.max_pool_layer(net, 2, 2)
     8         net = self.conv_layer(net, 128, 1, 1, 'conv_6')
     9         net = self.conv_layer(net, 256, 3, 1, 'conv_7')
    10         net = self.conv_layer(net, 256, 1, 1, 'conv_8')
    11         net = self.conv_layer(net, 512, 3, 1, 'conv_9')
    12         net = self.max_pool_layer(net, 2, 2)
    13         net = self.conv_layer(net, 256, 1, 1, 'conv_11')
    14         net = self.conv_layer(net, 512, 3, 1, 'conv_12')
    15         net = self.conv_layer(net, 256, 1, 1, 'conv_13')
    16         net = self.conv_layer(net, 512, 3, 1, 'conv_14')
    17         net = self.conv_layer(net, 256, 1, 1, 'conv_15')
    18         net = self.conv_layer(net, 512, 3, 1, 'conv_16')
    19         net = self.conv_layer(net, 256, 1, 1, 'conv_17')
    20         net = self.conv_layer(net, 512, 3, 1, 'conv_18')
    21         net = self.conv_layer(net, 512, 1, 1, 'conv_19')
    22         net = self.conv_layer(net, 1024, 3, 1, 'conv_20')
    23         net = self.max_pool_layer(net, 2, 2)
    24         net = self.conv_layer(net, 512, 1, 1, 'conv_22')
    25         net = self.conv_layer(net, 1024, 3, 1, 'conv_23')
    26         net = self.conv_layer(net, 512, 1, 1, 'conv_24')
    27         net = self.conv_layer(net, 1024, 3, 1, 'conv_25')
    28         net = self.conv_layer(net, 1024, 3, 1, 'conv_26')
    29         net = self.conv_layer(net, 1024, 3, 2, 'conv_28')
    30         net = self.conv_layer(net, 1024, 3, 1, 'conv_29')
    31         net = self.conv_layer(net, 1024, 3, 1, 'conv_30')
    32         net = self.flatten_layer(net)
    33         net = self.dense_layer(net, 512, activation=self.Leaky_Relu, scope='fc_33')
    34         net = self.dense_layer(net, 4096, activation=self.Leaky_Relu, scope='fc_34')
    35         net = self.dense_layer(net, 7 * 7 * 30, scope='fc_36')
    36     return net

    需要的一些层

     1 # 激活函数使用Leaky
     2 def Leaky_Relu(self, x):
     3     return tf.maximum(x * 0.1, x)
     4 # 卷积层
     5 def conv_layer(self, x, filter, kernel_size, stride, scope):
     6     channel = x.get_shape().as_list()[-1]
     7     weight = tf.Variable(tf.truncated_normal(shape=[kernel_size, kernel_size, channel, filter], stddev=0.1),
     8                          name="weights")
     9     bias = tf.Variable(tf.zeros([filter, ]), name="biases")
    10     pad_size = kernel_size // 2
    11     x = tf.pad(x, paddings=[[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])
    12 
    13     conv = tf.nn.conv2d(x, weight, strides=[1, stride, stride, 1], padding="VALID", name=scope)
    14     output = self.Leaky_Relu(tf.nn.bias_add(conv, bias))
    15     return output
    16 # 最大池化层
    17 def max_pool_layer(self, x, pool_size, stride):
    18     return tf.nn.max_pool(x, [1, pool_size, pool_size, 1], strides=[1, stride, stride, 1], padding="SAME")
    19 # 全连接层
    20 def dense_layer(self, x, filter, activation=None, scope=None):
    21     channel = x.get_shape().as_list()[-1]
    22     weight = tf.Variable(tf.truncated_normal(shape=[channel, filter], stddev=0.1), name="weights")
    23     bias = tf.Variable(tf.zeros([filter, ]), name="biases")
    24     output = tf.nn.xw_plus_b(x, weight, bias, name=scope)
    25     if activation:
    26         output = activation(output)
    27     return output
    28 # flatten层
    29 def flatten_layer(self, x):
    30     x = tf.transpose(x, [0, 3, 1, 2])
    31     shape = x.get_shape().as_list()[1:]
    32     nums = np.product(shape)
    33     return tf.reshape(x, [-1, nums])

    网络部分结束


    损失函数部分

    YOLO-V1损失函数:

     

    (1)只有当某个网格中有object的时候才对类别预测进行惩罚。

    (2)只有当某个bounding box对某个真实框负责的时候,才会对box的坐标预测进行惩罚,而对哪个真实框负责就看其bounding box和真实框的IOU是不是在那个网格中的所有box中最大。

    为什么公式中对w,h开根号呢?


    黑的框为bounding box,红色的框跟绿色的框为真实标注框,如果w,h没有平方根,那么bounding box跟两个真实标注的位置loss是相同的,但是从面积来看黑色的框是绿色的25倍,红色的框是黑色的81/25倍,黑色框跟绿色框的大小偏差更大,

    不应该得到相同的loss,如果w和h加上平方根,那么才更加符合我们的实际判断。

    计算IOU的函数

     1 def calc_iou(self, bboxes1, bboxes2):
     2     # 计算两个box的交集:交集左上角的点取两个box的max,交集右下角的点取两个box的min
     3     int_ymin = np.maximum(bboxes1[..., 0], bboxes2[..., 0])
     4     int_xmin = np.maximum(bboxes1[..., 1], bboxes2[..., 1])
     5     int_ymax = np.minimum(bboxes1[..., 2], bboxes2[..., 2])
     6     int_xmax = np.minimum(bboxes1[..., 3], bboxes2[..., 3])
     7 
     8     # 计算两个box交集的wh:如果两个box没有交集,那么wh为0(按照计算方式wh为负数,跟0比较取最大值)
     9     int_h = np.maximum(int_ymax - int_ymin, 0.)
    10     int_w = np.maximum(int_xmax - int_xmin, 0.)
    11 
    12     # 计算IOU
    13     int_vol = int_h * int_w  # 交集面积
    14     vol1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1])  # bboxes1面积
    15     vol2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1])  # bboxes2面积
    16     iou = int_vol / (vol1 + vol2 - int_vol)  # IOU=交集/并集
    17     return iou
      1 def loss_layer(self, predicts, labels, scope='loss_layer'):
      2     # label为((batch_size,7,7,25))  5个为盒子信息  (x,y,w,h,c)  后20个为类别
      3     with tf.variable_scope(scope):
      4         # 预测值
      5         # class-20
      6         #网络输出是(batch_size,1470)
      7         predict_classes = tf.reshape(
      8             predicts[:, :7 * 7 * 20],
      9             [self.batch_size, 7, 7, 20])
     10         # confidence-2
     11         predict_confidence = tf.reshape(
     12             predicts[:, 7 * 7 * 20:7 * 7 * 20 + 7 * 7 * 2],
     13             [self.batch_size, 7, 7, 2])
     14         # bounding box-2*4
     15         predict_boxes = tf.reshape(
     16             predicts[:, 7 * 7 * 20 + 7 * 7 * 2:],
     17             [self.batch_size, 7, 7, 2, 4])
     18 
     19         # 实际值
     20         # shape(45,7,7,1)
     21         # response中的值为0或者1.对应的网格中存在目标为1,不存在目标为0.
     22         # 存在目标指的是存在目标的中心点,并不是说存在目标的一部分。所以,目标的中心点所在的cell其对应的值才为1,其余的值均为0
     23         response = tf.reshape(
     24             labels[..., 0],
     25             [self.batch_size, 7, 7, 1])
     26         # shape(45,7,7,1,4)
     27         boxes = tf.reshape(
     28             labels[..., 1:5],
     29             [self.batch_size, 7, 7, 1, 4])
     30         # shape(45,7,7,2,4),boxes的四个值,取值范围为0~1
     31         boxes = tf.tile(
     32             boxes, [1, 1, 1, 2, 1]) / self.img_shape[0]
     33         # shape(45,7,7,20)
     34         classes = labels[..., 5:]
     35 
     36         # self.offset shape(7,7,2)
     37         # offset shape(1,7,7,2)
     38 
     39         # shape(45,7,7,2)
     40         x_offset = tf.tile(self.x_offset, [self.batch_size, 1, 1, 1])  # (45,7,7,2)
     41         # shape(45,7,7,2)
     42         y_offset = tf.transpose(x_offset, (0, 2, 1, 3))
     43 
     44 
     45         # shape(45,7,7,2,4)  ->(x,y,w,h)
     46         predict_boxes_tran = tf.stack(
     47             [(predict_boxes[..., 0] + x_offset) / 7,
     48              (predict_boxes[..., 1] + y_offset) / 7,
     49              tf.square(predict_boxes[..., 2]),
     50              tf.square(predict_boxes[..., 3])], axis=-1)
     51 
     52         # 预测box与真实box的IOU,shape(45,7,7,2)
     53         iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
     54 
     55         # shape(45,7,7,1)
     56         # 在训练时,如果该单元格内确实存在目标,那么只选择IOU最大的那个边界框来负责预测该目标,而其它边界框认为不存在目标
     57         object_mask = tf.reduce_max(iou_predict_truth, axis=3, keep_dims=True)
     58         # object_mask shape(45,7,7,2)
     59         object_mask = tf.cast(
     60             (iou_predict_truth >= object_mask), tf.float32) * response
     61 
     62         # noobject confidence(45,7,7,2)
     63         #单元格内没有物体的地方为1有物体的地方为0
     64         noobject_probs = tf.ones_like(
     65             object_mask, dtype=tf.float32) - object_mask
     66 
     67         # shape(45,7,7,2,4),对boxes的四个值进行规整,xy为相对于网格左上角,wh为取根号后的值,范围0~1
     68         boxes_tran = tf.stack(
     69             [boxes[..., 0] * 7 - x_offset,
     70              boxes[..., 1] * 7 - y_offset,
     71              tf.sqrt(boxes[..., 2]),
     72              tf.sqrt(boxes[..., 3])], axis=-1)
     73 
     74         # class_loss shape(45,7,7,20)
     75         class_delta = response * (predict_classes - classes)
     76         class_loss = tf.reduce_mean(
     77             tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
     78             name='class_loss') * self.class_scale
     79 
     80         # object_loss  confidence=iou*p(object)
     81         # p(object)的值为1或0
     82         object_delta = object_mask * (predict_confidence - iou_predict_truth)
     83         object_loss = tf.reduce_mean(
     84             tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
     85             name='object_loss') * self.object_scale
     86 
     87         # noobject_loss  p(object)的值为0
     88         noobject_delta = noobject_probs * predict_confidence
     89         noobject_loss = tf.reduce_mean(
     90             tf.reduce_sum(tf.square(noobject_delta), axis=[1, 2, 3]),
     91             name='noobject_loss') * self.noobject_scale
     92 
     93         # coord_loss
     94         coord_mask = tf.expand_dims(object_mask, 4)
     95         boxes_delta = coord_mask * (predict_boxes - boxes_tran)
     96         coord_loss = tf.reduce_mean(
     97             tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
     98             name='coord_loss') * self.coord_scale
     99 
    100         return class_loss + object_loss + noobject_loss + coord_loss

    损失函数部分结束


    YOLO_V1缺点
    1.每个网格只对应2个bounding box,当物体的长宽比不常见(也就是训练数据覆盖不到时),效果较差。

    2.原始图片只划分为7×7的网格,当两个物体考的很近时,效果比较差。

    3.最终每个网格只对应一个类别,容易出现漏检(物体没有被识别到) eg:两个物体中心点相同

    4.对于图片中比较小的物体,效果比较差。




     

  • 相关阅读:
    获得Coclor的色值(小技巧)
    如何禁止IIS缓存静态文件(png,js,html等)(转)
    风投最关心的问题
    Repeater一行显示数据库中多行表记录
    c# int Int32 Int64 的区别
    动车实名制了
    学习,积累,10000小时定律
    映射路由器到内网ip和端口
    《轮环》故事大纲整理
    .Net读取xlsx文件Excel2007
  • 原文地址:https://www.cnblogs.com/cucwwb/p/12791857.html
Copyright © 2011-2022 走看看