zoukankan      html  css  js  c++  java
  • 深度学习(三)——tiny YOLO算法实现实时目标检测(tensorflow实现)

    一、背景介绍

    YOLO算法全称You Only Look Once,是Joseph Redmon等人于15年3月发表的一篇文章。本实验目标为实现YOLO算法,借鉴了一部分材料,最终实现了轻量级的简化版YOLO——tiny YOLO,其优势在于实现简单,目标检测迅速。

    [1]文章链接:https://arxiv.org/abs/1506.02640

    [2]YOLO官网链接:https://pjreddie.com/darknet/yolo/

    二、算法原理简述

    相较于RCNN系列算法,YOLO算法最大的创新在于将物体检测作为回归问题来求解,而RCNN系列算法是将目标检测用一个region proposal + CNN来作为分类问题求解。 如下图所示,YOLO通过对输入图像进行推测,得到图中所有物体的位置及其所属类别的相应概率

    YOLO的网络模型结构包含有24个卷积层和2个全链接层,其具体结构如下:

    作者将YOLO算法应用于了不同数据集,进行过算法准确度的验证,平均来看,YOLO的目标检测准确度约为60%左右,这个精度已经算不错了。同时,YOLO的识别速度可以达到45帧,改进版的fast YOLO可以达到155帧,下面是从官网获取的关于COCO Dataset的模型应用结果统计:

    从中可以看到, Tiny YOLO虽然准确度平均只有23.7%,但是其识别速度可以达到244帧。

    下面再给出论文里的模型识别结果图,效果还是不错的:

    最后,附上几个网上关于YOLO模型几个比较好的解读:

    [3]YOLO_原理详述

    [4][目标检测]YOLO原理

    本文重点是实现简化版的tiny YOLO模型,主要参考了代码:

    [5]https://github.com/gliese581gg/YOLO_tensorflow

    三、算法实现

    1.所用文件

    首先要介绍一下所有用到的文件及其位置的安放。我的文件具体包含:

    1.  
      -- test (测试图像文件夹)
    2.  
      |------ 000001.jpg (测试文件)
    3.  
      -- weights (权重文件夹)
    4.  
      |------ YOLO_tiny.ckpt (权值文件)
    5.  
      -- main.py (运行文件)

    首先是test文件夹,里面放置需要测试的jpg文件就可以了。

    其次是weights文件夹,里面放置的是作者训练好的ckpt文件,该文件的下载可以从google drive中下载:

    不过从google drive中下载需要自己手动翻墙,而且下载速度会非常慢,我将该文件传到了自己的百度云上,有需要的话可以自行下载:

    链接:https://pan.baidu.com/s/1bug9ZX5P4OghfRxvE389fQ

    提取码:8tqz

     最后是main.py文件,具体如何写下面我会详细介绍。

    2.算法实现

    我的main.py文件是参考了程序YOLO_tiny_tf.py,并加上了自己的一些改进实现的。先来看一下tiny YOLO的模型结构:

    可以看到,tiny YOLO基本为VGG19模型的改进,然后将模型应用于图像中,对目标进行检测,可以按照这个思路,编写main.py文件,具体代码为:

    1.  
      import numpy as np
    2.  
      import tensorflow as tf
    3.  
      import cv2
    4.  
      import time
    5.  
       
    6.  
       
    7.  
      class YOLO_TF:
    8.  
      fromfile = 'test/person.jpg'
    9.  
      tofile_img = 'test/output.jpg'
    10.  
      tofile_txt = 'test/output.txt'
    11.  
      imshow = True
    12.  
      filewrite_img = False
    13.  
      filewrite_txt = False
    14.  
      disp_console = True
    15.  
      weights_file = 'weights/YOLO_tiny.ckpt'
    16.  
      alpha = 0.1
    17.  
      threshold = 0.2
    18.  
      iou_threshold = 0.5
    19.  
      num_class = 20
    20.  
      num_box = 2
    21.  
      grid_size = 7
    22.  
      classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
    23.  
      "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
    24.  
       
    25.  
      w_img = 640
    26.  
      h_img = 480
    27.  
       
    28.  
      def __init__(self, fromfile=None, tofile_img=None, tofile_txt=None):
    29.  
      self.fromfile = fromfile
    30.  
       
    31.  
      self.tofile_img = tofile_img
    32.  
      self.filewrite_img = True
    33.  
       
    34.  
      self.tofile_txt = tofile_txt
    35.  
      self.filewrite_txt = True
    36.  
       
    37.  
      self.imshow = True
    38.  
      self.disp_console = True
    39.  
       
    40.  
      self.build_networks()
    41.  
      if self.fromfile is not None: self.detect_from_file(self.fromfile)
    42.  
       
    43.  
      def build_networks(self):
    44.  
      if self.disp_console: print("Building YOLO_tiny graph...")
    45.  
      self.x = tf.placeholder('float32', [None, 448, 448, 3])
    46.  
      self.conv_1 = self.conv_layer(1, self.x, 16, 3, 1)
    47.  
      self.pool_2 = self.pooling_layer(2, self.conv_1, 2, 2)
    48.  
      self.conv_3 = self.conv_layer(3, self.pool_2, 32, 3, 1)
    49.  
      self.pool_4 = self.pooling_layer(4, self.conv_3, 2, 2)
    50.  
      self.conv_5 = self.conv_layer(5, self.pool_4, 64, 3, 1)
    51.  
      self.pool_6 = self.pooling_layer(6, self.conv_5, 2, 2)
    52.  
      self.conv_7 = self.conv_layer(7, self.pool_6, 128, 3, 1)
    53.  
      self.pool_8 = self.pooling_layer(8, self.conv_7, 2, 2)
    54.  
      self.conv_9 = self.conv_layer(9, self.pool_8, 256, 3, 1)
    55.  
      self.pool_10 = self.pooling_layer(10, self.conv_9, 2, 2)
    56.  
      self.conv_11 = self.conv_layer(11, self.pool_10, 512, 3, 1)
    57.  
      self.pool_12 = self.pooling_layer(12, self.conv_11, 2, 2)
    58.  
      self.conv_13 = self.conv_layer(13, self.pool_12, 1024, 3, 1)
    59.  
      self.conv_14 = self.conv_layer(14, self.conv_13, 1024, 3, 1)
    60.  
      self.conv_15 = self.conv_layer(15, self.conv_14, 1024, 3, 1)
    61.  
      self.fc_16 = self.fc_layer(16, self.conv_15, 256, flat=True, linear=False)
    62.  
      self.fc_17 = self.fc_layer(17, self.fc_16, 4096, flat=False, linear=False)
    63.  
      # skip dropout_18
    64.  
      self.fc_19 = self.fc_layer(19, self.fc_17, 1470, flat=False, linear=True)
    65.  
      self.sess = tf.Session()
    66.  
      self.sess.run(tf.global_variables_initializer())
    67.  
      self.saver = tf.train.Saver()
    68.  
      self.saver.restore(self.sess, self.weights_file)
    69.  
      if self.disp_console: print("Loading complete!" + ' ')
    70.  
       
    71.  
      def conv_layer(self, idx, inputs, filters, size, stride):
    72.  
      channels = inputs.get_shape()[3]
    73.  
      weight = tf.Variable(tf.truncated_normal([size, size, int(channels), filters], stddev=0.1))
    74.  
      biases = tf.Variable(tf.constant(0.1, shape=[filters]))
    75.  
       
    76.  
      pad_size = size // 2
    77.  
      pad_mat = np.array([[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])
    78.  
      inputs_pad = tf.pad(inputs, pad_mat)
    79.  
       
    80.  
      conv = tf.nn.conv2d(inputs_pad, weight, strides=[1, stride, stride, 1], padding='VALID',
    81.  
      name=str(idx) + '_conv')
    82.  
      conv_biased = tf.add(conv, biases, name=str(idx) + '_conv_biased')
    83.  
      if self.disp_console: print(
    84.  
      ' Layer %d : Type = Conv, Size = %d * %d, Stride = %d, Filters = %d, Input channels = %d' % (
    85.  
      idx, size, size, stride, filters, int(channels)))
    86.  
      return tf.maximum(self.alpha * conv_biased, conv_biased, name=str(idx) + '_leaky_relu')
    87.  
       
    88.  
      def pooling_layer(self, idx, inputs, size, stride):
    89.  
      if self.disp_console: print(
    90.  
      ' Layer %d : Type = Pool, Size = %d * %d, Stride = %d' % (idx, size, size, stride))
    91.  
      return tf.nn.max_pool(inputs, ksize=[1, size, size, 1], strides=[1, stride, stride, 1], padding='SAME',
    92.  
      name=str(idx) + '_pool')
    93.  
       
    94.  
      def fc_layer(self, idx, inputs, hiddens, flat=False, linear=False):
    95.  
      input_shape = inputs.get_shape().as_list()
    96.  
      if flat:
    97.  
      dim = input_shape[1] * input_shape[2] * input_shape[3]
    98.  
      inputs_transposed = tf.transpose(inputs, (0, 3, 1, 2))
    99.  
      inputs_processed = tf.reshape(inputs_transposed, [-1, dim])
    100.  
      else:
    101.  
      dim = input_shape[1]
    102.  
      inputs_processed = inputs
    103.  
      weight = tf.Variable(tf.truncated_normal([dim, hiddens], stddev=0.1))
    104.  
      biases = tf.Variable(tf.constant(0.1, shape=[hiddens]))
    105.  
      if self.disp_console: print(
    106.  
      ' Layer %d : Type = Full, Hidden = %d, Input dimension = %d, Flat = %d, Activation = %d' % (
    107.  
      idx, hiddens, int(dim), int(flat), 1 - int(linear)))
    108.  
      if linear: return tf.add(tf.matmul(inputs_processed, weight), biases, name=str(idx) + '_fc')
    109.  
      ip = tf.add(tf.matmul(inputs_processed, weight), biases)
    110.  
      return tf.maximum(self.alpha * ip, ip, name=str(idx) + '_fc')
    111.  
       
    112.  
      def detect_from_cvmat(self, img):
    113.  
      s = time.time()
    114.  
      self.h_img, self.w_img, _ = img.shape
    115.  
      img_resized = cv2.resize(img, (448, 448))
    116.  
      img_RGB = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)
    117.  
      img_resized_np = np.asarray(img_RGB)
    118.  
      inputs = np.zeros((1, 448, 448, 3), dtype='float32')
    119.  
      inputs[0] = (img_resized_np / 255.0) * 2.0 - 1.0
    120.  
      in_dict = {self.x: inputs}
    121.  
      net_output = self.sess.run(self.fc_19, feed_dict=in_dict)
    122.  
      self.result = self.interpret_output(net_output[0])
    123.  
      self.show_results(img, self.result)
    124.  
      strtime = str(time.time() - s)
    125.  
      if self.disp_console: print('Elapsed time : ' + strtime + ' secs' + ' ')
    126.  
       
    127.  
      def detect_from_file(self, filename):
    128.  
      if self.disp_console: print('Detect from ' + filename)
    129.  
      img = cv2.imread(filename)
    130.  
      self.detect_from_cvmat(img)
    131.  
       
    132.  
      def interpret_output(self, output):
    133.  
      probs = np.zeros((7, 7, 2, 20))
    134.  
      class_probs = np.reshape(output[0:980], (7, 7, 20))
    135.  
      scales = np.reshape(output[980:1078], (7, 7, 2))
    136.  
      boxes = np.reshape(output[1078:], (7, 7, 2, 4))
    137.  
      offset = np.transpose(np.reshape(np.array([np.arange(7)] * 14), (2, 7, 7)), (1, 2, 0))
    138.  
       
    139.  
      boxes[:, :, :, 0] += offset
    140.  
      boxes[:, :, :, 1] += np.transpose(offset, (1, 0, 2))
    141.  
      boxes[:, :, :, 0:2] = boxes[:, :, :, 0:2] / 7.0
    142.  
      boxes[:, :, :, 2] = np.multiply(boxes[:, :, :, 2], boxes[:, :, :, 2])
    143.  
      boxes[:, :, :, 3] = np.multiply(boxes[:, :, :, 3], boxes[:, :, :, 3])
    144.  
       
    145.  
      boxes[:, :, :, 0] *= self.w_img
    146.  
      boxes[:, :, :, 1] *= self.h_img
    147.  
      boxes[:, :, :, 2] *= self.w_img
    148.  
      boxes[:, :, :, 3] *= self.h_img
    149.  
       
    150.  
      for i in range(2):
    151.  
      for j in range(20):
    152.  
      probs[:, :, i, j] = np.multiply(class_probs[:, :, j], scales[:, :, i])
    153.  
       
    154.  
      filter_mat_probs = np.array(probs >= self.threshold, dtype='bool')
    155.  
      filter_mat_boxes = np.nonzero(filter_mat_probs)
    156.  
      boxes_filtered = boxes[filter_mat_boxes[0], filter_mat_boxes[1], filter_mat_boxes[2]]
    157.  
      probs_filtered = probs[filter_mat_probs]
    158.  
      classes_num_filtered = np.argmax(filter_mat_probs, axis=3)[
    159.  
      filter_mat_boxes[0], filter_mat_boxes[1], filter_mat_boxes[2]]
    160.  
       
    161.  
      argsort = np.array(np.argsort(probs_filtered))[::-1]
    162.  
      boxes_filtered = boxes_filtered[argsort]
    163.  
      probs_filtered = probs_filtered[argsort]
    164.  
      classes_num_filtered = classes_num_filtered[argsort]
    165.  
       
    166.  
      for i in range(len(boxes_filtered)):
    167.  
      if probs_filtered[i] == 0: continue
    168.  
      for j in range(i + 1, len(boxes_filtered)):
    169.  
      if self.iou(boxes_filtered[i], boxes_filtered[j]) > self.iou_threshold:
    170.  
      probs_filtered[j] = 0.0
    171.  
       
    172.  
      filter_iou = np.array(probs_filtered > 0.0, dtype='bool')
    173.  
      boxes_filtered = boxes_filtered[filter_iou]
    174.  
      probs_filtered = probs_filtered[filter_iou]
    175.  
      classes_num_filtered = classes_num_filtered[filter_iou]
    176.  
       
    177.  
      result = []
    178.  
      for i in range(len(boxes_filtered)):
    179.  
      result.append([self.classes[classes_num_filtered[i]], boxes_filtered[i][0], boxes_filtered[i][1],
    180.  
      boxes_filtered[i][2], boxes_filtered[i][3], probs_filtered[i]])
    181.  
       
    182.  
      return result
    183.  
       
    184.  
      def show_results(self, img, results):
    185.  
      img_cp = img.copy()
    186.  
      if self.filewrite_txt:
    187.  
      ftxt = open(self.tofile_txt, 'w')
    188.  
      for i in range(len(results)):
    189.  
      x = int(results[i][1])
    190.  
      y = int(results[i][2])
    191.  
      w = int(results[i][3]) // 2
    192.  
      h = int(results[i][4]) // 2
    193.  
      if self.disp_console: print(
    194.  
      ' class : ' + results[i][0] + ' , [x,y,w,h]=[' + str(x) + ',' + str(y) + ',' + str(
    195.  
      int(results[i][3])) + ',' + str(int(results[i][4])) + '], Confidence = ' + str(results[i][5]))
    196.  
      if self.filewrite_img or self.imshow:
    197.  
      cv2.rectangle(img_cp, (x - w, y - h), (x + w, y + h), (0, 255, 0), 2)
    198.  
      cv2.rectangle(img_cp, (x - w, y - h - 20), (x + w, y - h), (125, 125, 125), -1)
    199.  
      cv2.putText(img_cp, results[i][0] + ' : %.2f' % results[i][5], (x - w + 5, y - h - 7),
    200.  
      cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1)
    201.  
      if self.filewrite_txt:
    202.  
      ftxt.write(results[i][0] + ',' + str(x) + ',' + str(y) + ',' + str(w) + ',' + str(h) + ',' + str(
    203.  
      results[i][5]) + ' ')
    204.  
      if self.filewrite_img:
    205.  
      if self.disp_console: print(' image file writed : ' + self.tofile_img)
    206.  
      cv2.imwrite(self.tofile_img, img_cp)
    207.  
      if self.imshow:
    208.  
      cv2.imshow('YOLO_tiny detection', img_cp)
    209.  
      cv2.waitKey(1)
    210.  
      if self.filewrite_txt:
    211.  
      if self.disp_console: print(' txt file writed : ' + self.tofile_txt)
    212.  
      ftxt.close()
    213.  
       
    214.  
      def iou(self, box1, box2):
    215.  
      tb = min(box1[0] + 0.5 * box1[2], box2[0] + 0.5 * box2[2]) - max(box1[0] - 0.5 * box1[2],
    216.  
      box2[0] - 0.5 * box2[2])
    217.  
      lr = min(box1[1] + 0.5 * box1[3], box2[1] + 0.5 * box2[3]) - max(box1[1] - 0.5 * box1[3],
    218.  
      box2[1] - 0.5 * box2[3])
    219.  
      if tb < 0 or lr < 0:
    220.  
      intersection = 0
    221.  
      else:
    222.  
      intersection = tb * lr
    223.  
      return intersection / (box1[2] * box1[3] + box2[2] * box2[3] - intersection)
    224.  
       
    225.  
       
    226.  
      if __name__ == '__main__':
    227.  
      fromfile = 'test/000001.jpg'
    228.  
      tofile_img = 'test/output.jpg'
    229.  
      tofile_txt = 'test/output.txt'
    230.  
       
    231.  
      yolo = YOLO_TF(fromfile=fromfile, tofile_img=tofile_img, tofile_txt=tofile_txt)
    232.  
      cv2.waitKey(1000)

    四、效果测试

    直接运行上述代码,便可执行程序。根据代码:

    1.  
      classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
    2.  
      "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

    tiny YOLO只可识别上述常见的20类对象。关于上述代码的使用,每次测试图像时,只用修改倒数第5行的fromfile参数,然后直接运行便可执行目标检测。

    下面给出目标检测的效果,虽然人检测了出来,但是自行车没有被检测到,还有将猫错误识别成狗的:

    目前来看,虽然识别精度不高,但是主要对象还是能够识别出来的。

    五、分析

    1.tiny YOLO目前是需要下载别人训练好的文件进行实验,如何训练还有待于进一步学习。

    2.tiny YOLO目前的识别精度不是很高,不过识别速度很快。另外对于一些具有重叠部分的对象,其识别效果可能会比较差。

    原文:https://blog.csdn.net/z704630835/article/details/83614909

  • 相关阅读:
    hdu4347 The Closest M Points(kdtree+stl)
    bzoj1941 [Sdoi2010]Hide and Seek
    bzoj1941 [Sdoi2010]Hide and Seek
    bzoj2648 SJY摆棋子(不带修改的KDtree的学习)
    bzoj2648 SJY摆棋子(不带修改的KDtree的学习)
    bzoj2588 Spoj 10628. Count on a tree
    hdu2665 Kth number(主席树模板)
    hdu2665 Kth number(主席树模板)
    luoguP3168 [CQOI2015]任务查询系统
    12.模板别名以及auto定义返回值
  • 原文地址:https://www.cnblogs.com/Ph-one/p/13969978.html
Copyright © 2011-2022 走看看