zoukankan html css js c++ java

无人驾驶

所需文件：本地下载

无人驾驶 - 车辆检测

本文使用非常强大的 YOLO 模型用来进行目标检测。本文所采用的思想都是来自两篇论文： Redmon et al., 2016 和 Redmon and Farhadi, 2016 。

导入依赖库

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

%matplotlib inline

1 - 问题描述

若想实现无人驾驶汽车，作为一个非常重要的部分就是要构建一个汽车检测系统。为了收集数据，可以在汽车前引擎盖上安置一个摄像头，当驾驶汽车时，每隔几秒钟就拍摄道路前方的图片。

图片拍摄于汽车在硅谷行驶时从车载摄像头拍摄。

此数据集来自于 drive.ai

已经将所有收集到的图像放到一个文件夹里并且已经通过矩形边界框标记出每一个发现的车辆。下面有一个边界框定义的例子：

边界的定义

有80个类别希望被目标检测器识别出，使用 (c) 来表示类别标签，可用两种方式，分别是使用整数从1到80或者是一个80维的向量其中一个是1其它全为0。在文本中，将使用两种表示方式，这依赖于哪一种方式更加方便。

本文中，将使用学习“You Only Look Once”（YOLO）执行目标检测，并且用于车辆检测。因为YOLO模型训练需要非常大的计算量，将加载预训练的权重使用。

2 - YOLO

“You Only Look Once”（YOLO）是一个流行的算法，因为它实现了非常高的准确率的同时有能力实时运行。这个算法“only look once”在某种意义上说图像只需要一次前向传播到网络上做出预测。在抑制非最大检测（non-max suppression）后，将输出识别到的目标并使用边框标记出。

2.1 - 模型细节

输入和输出

输入是一个批量的图像，每一批量尺寸为 ((m, 608, 608, 3)) 。
输出是一系列识别到的类别和边界框。每一个边界框被6个数字表示 ((p_c, b_x, b_y, b_h, b_w, c) 。如果将 (c) 展开为一个80维的向量，则每一个边界框由85个数字表示。

锚框（Anchor Boxes）

Anchor boxes是通过探索训练集来选择的，选择为一个合理的 height/width 比例表示不同的类别。在本文中，使用了5个锚框，并将其存在 ./model_data/yolo_anchors.txt 文件中。
对于每一个锚框的维度是在编码为倒数第二个维度： ((m, n_H, n_W, anchors, classes)) 。
YOLO结构为： (Image(m, 608, 608, 3) -> Deep CNN -> Encoding(m, 19, 19, 5, 85)) 。

编码（Encoding）

下面是一个更加细节的编码表示。

YOLO的编码结构

如果一个目标的中心点在一个网格中，网格单元负责检测该目标。

因为使用了5种锚框，因此 (19 imes 19) 单元中的每一个单元都带有编码信息为关于5个锚框。锚框仅被定义为它的宽度和长度。

为了简单起见，将 ((19, 19, 5, 85)) 的最后两个维度展开编码，所以深层卷积网络的输出尺寸为 ((19, 19, 425)) 。

将最后两个维度展开

类别得分（Class score）

现在，对于每一个边框（每一个单元的）将计算按元素的乘积并得到一个边框包含了某个类别的概率。

类别得分计算公式 (scire_{c, i}=p_c imes c_i) ：存在一个目标的概率 (p_c) 乘以某个类别的的概率 (c_i) 。

找到每个边框检测到的类别

在上图中，对于第一个单元格的边框1，一个目标存在的概率为 (p_1=0.60) 。所以在第一个边框中存在目标有 (60\%) 的概率。
一个目标是“类别3（汽车）”的概率是 (c_3=0.73) 。
对于边框1和类别3的得分为 (score_{1, 3}=0.60 imes 0.73=0.44) 。
对于边框1中的全部80个类别计算得到的分数，并找出得分的最大值即类别3（即属于汽车类别）的。所以将此得分0.44和类别3赋到边框1中。

类别可视化

下面一种方法可以将YOLO的预测在图片上展示出：

对于每一个 (19 imes 19) 的单元网格，找出最大的概率得分（选取80个类别中的最大值，每一个最大值对应着5个锚框）。
将每个单元网格认为的左右可能的目标上色

每一个 (19 imes 19) 的单元格根据最大概率的类别被填充对应的颜色

此可视化方法不是用来做预测的YOLO算法本身的核心部分，这仅仅是一个将中间结果输出的比较好的方法。

边框可视化

另一个将YOLO输出结果可视化的方法是画出输出的边框。如下图所示：

每一个单元给了5个边框。总的来说，模型的预测：仅仅一张图片就有（一次前向传播通过网络） (19 imes 19 imes 5=1805) 个边框，不同的类别使用不同的颜色标记。

抑制非最大值输出（Non-Max suppression）

在上面的图片中，仅仅画出模型预测的所有类别中的高概率的边框，但是这仍然有太多的边框。要对算法的输出的检测到的目标减少到一个比较少的数量。

为了这么做，将使用 non-max suppression 方法。具体来说，要执行以下步骤：

丢弃得分低的边框（这意味着，边框对于那些检测到的类别中比较低的可信度（检测到任何目标的低概率和某个特定类别的低概率）。
当几个边框彼此都选择了同一个物体，边框有着重叠部分，仅选择一个边框。

2.2 - 使用阈值在类别得分上进行过滤

首先使用一个阈值进行过滤。丢弃那些类别得分少于阈值的边框。

模型输出总共 (19 imes 19 imes 5 imes 85) 个数字，每一个边框使用85数字描述。有一个很方便的方式是重新排列 ((19, 19, 5, 85) （或 ((19, 19, 425)) ）维度张量到下面的变量中：

box_confidence ：形状为 ((19, 19, 5, 1)) 的张量，包含了在每个 (19 imes 19) 的单元格中的每5个边框的 (p_c) （识别到某个物品的可信度）。
boxes ：形状为 ((19, 19, 5, 4)) 的张量，包含了每一个单元格中的5个边框的中心点和维度 ((b_x, b_y, b_h, b_w)) 。
box_class_probs ：形状为 ((19, 19, 5, 1)) 的张量，包含了每个单元格中每5个边框中的80个类别的概率 ((c_1, c_2, dots, c_{80})) 。

编程实践： 实现 yolo_filter_boxes() 。

通过图片4描述的按元素乘积 ((p imes c)) 计算边框得分。

a = np.random.randn(19, 19, 5, 1)
b = np.random.randn(19, 19, 5, 80)
c = a * b # shape of c will be (19, 19, 5, 80)

这涉及到一个 广播（broadcasting） 机制（不同形状的向量相乘）的例子。

对于每一个边框，找到：

最大值的得分边框的类别索引
对应的边框得分

参考文档：

说明：

对于 argmax 和 max 的 axis 参数，如果想选择最后一个轴，一个方法是直接赋值为 axis=-1 。这类似于Python的数组索引。
执行 max 通常会将指定的 axis 轴折叠到其上一个维度。 keepdims=False 是默认选项，并且这允许指定的维度被移除。在本例中执行完最大值后不需要保持最后一个维度。

使用阈值创建一个遮罩。例如 ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) 返回： [False, True, False, False, True] 。这个遮罩中为 True 的即为想要保留的边框。
使用 TensorFlow 应用遮罩到 box_class_scores ， boxes 和 box_classes 过滤出不想要的边框。应该留下想要保留的边框的一个子集。

参考文档：

boolean mask

说明：

对于 tf.boolean_mask 函数，可以保持默认值 axis=None

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    """
    Filters YOLO boxes by thresholding on object and class confidence.
    
    Arguments:
        box_confidence -- tensor of shape (19, 19, 5, 1)
        boxes -- tensor of shape (19, 19, 5, 4)
        box_class_probs -- tensor of shape (19, 19, 5, 80)
        threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    
    Returns:
        scores -- tensor of shape (None,), containing the class probability score for selected boxes
        boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
        classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
    
    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """
    
    # Step 1: Compute box scores
    ### START CODE HERE ### (≈ 1 line)
    box_scores = box_confidence * box_class_probs
    ### END CODE HERE ###
    
    # Step 2: Find the box_classes using the max box_scores, keep track of the corresponding score
    ### START CODE HERE ### (≈ 2 lines)
    box_classes = K.argmax(box_scores, axis=-1)
    box_class_scores = K.max(box_scores, axis=-1)
    ### END CODE HERE ###
    
    # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
    # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    ### START CODE HERE ### (≈ 1 line)
    filtering_mask = box_class_scores >= threshold
    ### END CODE HERE ###
    
    # Step 4: Apply the mask to box_class_scores, boxes and box_classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)
    ### END CODE HERE ###
    
    return scores, boxes, classes

测试用例：

with tf.Session() as test_a:
    box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed = 1)
    box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.shape))
    print("boxes.shape = " + str(boxes.shape))
    print("classes.shape = " + str(classes.shape))

输出：

scores[2] = 10.7506
boxes[2] = [ 8.42653275  3.27136683 -0.5313437  -4.94137383]
classes[2] = 7
scores.shape = (?,)
boxes.shape = (?, 4)
classes.shape = (?,)

注意测试用例中，使用了随机数来测试 yolo_filter_boxes 函数。在实际数据中， box_class_probs 将只包含0到1直接的非零数字用来代表概率值；在 boxes 中的边框坐标，其长度和宽度都是非负数。

2.3 - 非最大抑制（Non-max suppression）

在使用阈值对每一个类得分上进行过滤之后，仍然会有很多重叠的边框。第二个过滤器是选择正确的边框，这被称为non-max suppression（NMS）。

在这个例子中，模型预测了3辆汽车，但是实际上3个预测都是相同的一辆汽车。运行non-max suppression（NMS）将会从3个边框中仅选择一个最准确的（最高的概率）边框。

非最大抑制使用非常重要的函数称为 Intersection over Union （IOU）。

Intersection over Union（IOU）的定义

编程实践： 实现iou()

在本文中，使用 ((0, 0)) 表示图像的左上角， ((1, 0)) 表示右上角， ((1, 1)) 表示右下角。换句话说， ((0, 0)) 是从图像的左上角开始。当增加 (x) 时，就是向右移动。增加 (y) 时，就是向下移动。
使用两个边角定义一个边框：左上方为 ((x_1, y_1)) 和右下方表示 ((x_2, y_2)) ，而不是使用中心点、长度和宽度。这让计算交集部分的时候简单点。
为了计算矩形的面积，其高度 ((y_2 - y_1)) 乘以宽度 ((x_2 - x_1)) 。（因为 ((x_1, y_1)) 是左上角并且 ((x_2, y_2)) 是右下角，它们的差应该是非负数。
定义两个边框 ((xi_1, yi_1, xi_2, yi_2)) 的 交集（intersection） 部分：
- 交集的左上角 ((xi_1, yi_1)) 是通过比较两个边框的左上角 ((x_1, y_1)) 并找出哪一个顶点的x坐标更靠右些，和y坐标更靠下些。
- 交集的右下角 ((xi_2, yi_2)) 是通过比较两个边框的右下角 ((x_2, y_2)) 并找出哪一个顶点的x坐标更靠左些，和y坐标更靠上些。
- 两个边框也许 没有交集部分 。可以检测计算得到的交集坐标是否是右上角（或左下角）。另一个方法是判断计算得到的高度 ((y_2 - y_1)) 或者宽度 ((x_2 - x_1)) 至少一个长度是负数；就可以得到没有交集（即交集面积为0）。
- 两个边框的交集也许在点上或边上，这种情况下交集面积仍然为0。这发生在计算得到的交集部分的长度或宽度（或两个长度）为0。

附加说明：

xi1 = 两个边框的x1坐标的最大值
yi1 = 两个边框的y1坐标的最大值
xi2 = 两个边框的x2坐标的最小值
yi2 = 两个边框的y2坐标的最小值
inter_area = 可以使用 max(height, 0) 和 max(width, 0)

def iou(box1, box2):
    """
    Implement the intersection over union (IoU) between box1 and box2
  
    Arguments:
        box1 -- first box, list object with coordinates (box1_x1, box1_y1, box1_x2, box_1_y2)
    """

    # Assign variable names to coordinates for clarity
    (box1_x1, box1_y1, box1_x2, box1_y2) = box1
    (box2_x1, box2_y1, box2_x2, box2_y2) = box2
    
    # Calculate the (yi1, xi1, yi2, xi2) coordinates of the intersection of box1 and box2. Calculate its Area.
    ### START CODE HERE ### (≈ 7 lines)
    xi1 = max(box1_x1, box2_x1)
    yi1 = max(box1_y1, box2_y1)
    xi2 = min(box1_x2, box2_x2)
    yi2 = min(box1_y2, box2_y2)
    inter_width = xi2 - xi1
    inter_height = yi2 - yi1
    inter_area = inter_width * inter_height if(inter_width > 0 and inter_height > 0) else 0
    ### END CODE HERE ###    

    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    ### START CODE HERE ### (≈ 3 lines)
    box1_area = (box1_x2 - box1_x1) * (box1_y2 - box1_y1)
    box2_area = (box2_x2 - box2_x1) * (box2_y2 - box2_y1)
    union_area = box1_area + box2_area - inter_area
    ### END CODE HERE ###
    
    # compute the IoU
    ### START CODE HERE ### (≈ 1 line)
    iou = inter_area / union_area
    ### END CODE HERE ###
    
    return iou

测试用例：

## Test case 1: boxes intersect
box1 = (2, 1, 4, 3)
box2 = (1, 2, 3, 4) 
print("iou for intersecting boxes = " + str(iou(box1, box2)))

## Test case 2: boxes do not intersect
box1 = (1,2,3,4)
box2 = (5,6,7,8)
print("iou for non-intersecting boxes = " + str(iou(box1,box2)))

## Test case 3: boxes intersect at vertices only
box1 = (1,1,2,2)
box2 = (2,2,3,3)
print("iou for boxes that only touch at vertices = " + str(iou(box1,box2)))

## Test case 4: boxes intersect at edge only
box1 = (1,1,3,3)
box2 = (2,3,3,4)
print("iou for boxes that only touch at edges = " + str(iou(box1,box2)))

输出：

iou for intersecting boxes = 0.14285714285714285
iou for non-intersecting boxes = 0.0
iou for boxes that only touch at vertices = 0.0
iou for boxes that only touch at edges = 0.0

YOLO non-max suppression

实现non-max suppression的几个关键步骤是：

选择最高得分的边框。
计算每一个边框的重叠（overlap）部分，然后移除有明显重叠的边框（ (iouge iou_threshold) ）。
返回第一步继续迭代直到不再有比当前选择的边框得分更低的边框。

这将会移除掉与当前选择的边框有着高度重叠部分的所有边框。仅剩下最高得分的边框。

编程实践：使用TensorFlow实现 yolo_non_max_suppression() 函数。TensorFlow有两个内建函数，它们用来实现non-max suppression（所以，实际上不需要使用上面自己实现的 iou() 函数）。

参考文档：

tf.image.non_max_suppression()

tf.image.non_max_suppression(
    boxes,
    scores,
    max_output_size,
    iou_threshold=0.5,
    name=None
)

注意TensorFlow的版本，这里没有 score_threshold 参数（在文档里显示的最新版本）。

keras.backen.gather()

tf.keras.backend.gather(
    reference, indices
)

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes
    
    Arguments:
        scores -- tensor of shape (None,), output of yolo_filter_boxes()
        boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
        classes -- tensor of shape (None,), output of yolo_filter_boxes()
        max_boxes -- integer, maximum number of predicted boxes you'd like
        iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
        scores -- tensor of shape (, None), predicted score for each box
        boxes -- tensor of shape (4, None), predicted box coordinates
        classes -- tensor of shape (, None), predicted class for each box
    
    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """
    
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor
    
    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ### START CODE HERE ### (≈ 1 line)
    nms_indices = tf.image.non_max_suppression(boxes=boxes, scores=scores, max_output_size=max_boxes_tensor, iou_threshold=iou_threshold)
    ### END CODE HERE ###
    
    # Use K.gather() to select only nms_indices from scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = K.gather(reference=scores, indices=nms_indices)
    boxes = K.gather(reference=boxes, indices=nms_indices)
    classes = K.gather(reference=classes, indices=nms_indices)
    ### END CODE HERE ###
    
    return scores, boxes, classes

测试用例：

with tf.Session() as test_b:
    scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)
    classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

输出：

scores[2] = 6.9384
boxes[2] = [-5.299932    3.13798141  4.45036697  0.95942086]
classes[2] = -2.24527
scores.shape = (10,)
boxes.shape = (10, 4)
classes.shape = (10,)

2.4 封装过滤器

现在要实现一个函数，输入深度卷积神经网络（deep CNN）的输出（编码为 (19 imes 19 imes 5 imes 85) 的维度）通过刚才实现的函数进行过滤边框。

编程实践：实现 yolo_eval() 其输入为YOLO编码的输出并使用得分阈值和NMS（non-max suppression）进行过滤。有几个方法表示边框，比如通过它们的边角或者通过它们的中心点和长度/宽度。YOLO使用下面的函数实现在不同时间的几种此类格式之间的转换。

boxes = yolo_boxes_to_corners(box_xy, box_wh)

其转换YOLO边框坐标 ((x, y, w, h)) 到边角边框的坐标 ((x_1, y_1, x_2, y_2)) 来拟合 yolo_filter_boxes 的输入。

boxes = scale_boxes(boxes, image_shape)

YOLO的网络是训练在 (608 imes 608) 的图像上。如果测试数据是不同尺寸的图像，比如，汽车检测数据集是 (720 imes 1280) ，这一步骤是重新缩放为了让边框能在原始图像 (720 imes 1280) 上画出来。

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
    
    Arguments:
        yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
                        box_confidence: tensor of shape (None, 19, 19, 5, 1)
                        box_xy: tensor of shape (None, 19, 19, 5, 2)
                        box_wh: tensor of shape (None, 19, 19, 5, 2)
                        box_class_probs: tensor of shape (None, 19, 19, 5, 80)
        image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
        max_boxes -- integer, maximum number of predicted boxes you'd like
        score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
        iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
        scores -- tensor of shape (None, ), predicted score for each box
        boxes -- tensor of shape (None, 4), predicted box coordinates
        classes -- tensor of shape (None,), predicted class for each box
    """
    
    ### START CODE HERE ### 
    
    # Retrieve outputs of the YOLO model (≈1 line)
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs

    # Convert boxes to be ready for filtering functions (convert boxes box_xy and box_wh to corner coordinates)
    boxes = yolo_boxes_to_corners(box_xy, box_wh)

    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
    scores, boxes, classes = yolo_filter_boxes(box_confidence=box_confidence, boxes=boxes, box_class_probs=box_class_probs, threshold=score_threshold)
    
    # Scale boxes back to original image shape.
    boxes = scale_boxes(boxes, image_shape)

    # Use one of the functions you've implemented to perform Non-max suppression with 
    # maximum number of boxes set to max_boxes and a threshold of iou_threshold (≈1 line)
    scores, boxes, classes = yolo_non_max_suppression(scores=scores, boxes=boxes, classes=classes, max_boxes=max_boxes, iou_threshold=iou_threshold)
    
    ### END CODE HERE ###
    
    return scores, boxes, classes

测试用例：

with tf.Session() as test_b:
    yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
    scores, boxes, classes = yolo_eval(yolo_outputs)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

输出：

scores[2] = 138.791
boxes[2] = [ 1292.32971191  -278.52166748  3876.98925781  -835.56494141]
classes[2] = 54
scores.shape = (10,)
boxes.shape = (10, 4)
classes.shape = (10,)

YOLO 总结

输入图像为 ((608, 608, 3)) 。
输入图像通过CNN输出结果的维度为 ((19, 19, 5, 85)) 。
展开后两个维度后，输出变成了 ((19, 19, 425)) ：
- 输入图像上的每一个 (19 imes 19) 的单元格都给出了425个数字。
- (425 = 5 imes 85) 因为每一个单元都包含5个边框（锚框）的预测。
- (85 = 5+80) 其中5是因为 ((p_c, b_x, b_y, b_h, b_w)) 有5个数字，并且有想要检测的80个类别。
仅选择少许的边框基于：
- 得分阈值：过滤掉检测到的类别的得分少于阈值的边框。
- 非最大抑制（Non-max suppression）：计算Intersection over Union（IOU）避免选择重叠边框。
给出YOLO的最终输出。

3 - 在图像上测试YOLO预训练模型

使用预训练的模型并用其测试车辆检测的数据集。首先需要一个session用来执行计算图和评估张量。

sess = K.get_session()

3.1 - 定义类别、锚框和图像形状

检测80个类别并使用5种锚框。
80个类别的信息和5个锚框都收集在了 coco_classes.txt 和 yolo_anchors.txt 文件中。
从文本文件中读取类别名称和锚框。
车辆检测的数据集的图像是 (720 imes 1280) 将其预处理为 (608 imes 608) 的图像。

class_names = read_classes("./model_data/coco_classes.txt")
anchors = read_anchors("./model_data/yolo_anchors.txt")
image_shape = (720., 1280.)

3.2 - 加载预训练模型

训练一个YOLO模型需要花费非常长的时间并且对于很大范围的目标类别，需要已经标注了边框的相当大的数据集。
加载的预训练的Keras YOLO模型，存储在 yolo.h5 文件中。
这些权重来自于YOLO官方网站，并且使用被Allan Zelender编写的一个函数将其转换。从技术上来说，这些参数来自于YOLOv2模型。

从文件中加载模型：

yolo_model = load_model("./model_data/yolo.h5")

输出模型包含的每一层。

yolo_model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 608, 608, 3)   0                                            
____________________________________________________________________________________________________
conv2d_1 (Conv2D)                (None, 608, 608, 32)  864         input_1[0][0]                    
____________________________________________________________________________________________________
batch_normalization_1 (BatchNorm (None, 608, 608, 32)  128         conv2d_1[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)        (None, 608, 608, 32)  0           batch_normalization_1[0][0]      
____________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)   (None, 304, 304, 32)  0           leaky_re_lu_1[0][0]              
____________________________________________________________________________________________________
conv2d_2 (Conv2D)                (None, 304, 304, 64)  18432       max_pooling2d_1[0][0]            
____________________________________________________________________________________________________
batch_normalization_2 (BatchNorm (None, 304, 304, 64)  256         conv2d_2[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)        (None, 304, 304, 64)  0           batch_normalization_2[0][0]      
____________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)   (None, 152, 152, 64)  0           leaky_re_lu_2[0][0]              
____________________________________________________________________________________________________
conv2d_3 (Conv2D)                (None, 152, 152, 128) 73728       max_pooling2d_2[0][0]            
____________________________________________________________________________________________________
batch_normalization_3 (BatchNorm (None, 152, 152, 128) 512         conv2d_3[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)        (None, 152, 152, 128) 0           batch_normalization_3[0][0]      
____________________________________________________________________________________________________
conv2d_4 (Conv2D)                (None, 152, 152, 64)  8192        leaky_re_lu_3[0][0]              
____________________________________________________________________________________________________
batch_normalization_4 (BatchNorm (None, 152, 152, 64)  256         conv2d_4[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)        (None, 152, 152, 64)  0           batch_normalization_4[0][0]      
____________________________________________________________________________________________________
conv2d_5 (Conv2D)                (None, 152, 152, 128) 73728       leaky_re_lu_4[0][0]              
____________________________________________________________________________________________________
batch_normalization_5 (BatchNorm (None, 152, 152, 128) 512         conv2d_5[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)        (None, 152, 152, 128) 0           batch_normalization_5[0][0]      
____________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)   (None, 76, 76, 128)   0           leaky_re_lu_5[0][0]              
____________________________________________________________________________________________________
conv2d_6 (Conv2D)                (None, 76, 76, 256)   294912      max_pooling2d_3[0][0]            
____________________________________________________________________________________________________
batch_normalization_6 (BatchNorm (None, 76, 76, 256)   1024        conv2d_6[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)        (None, 76, 76, 256)   0           batch_normalization_6[0][0]      
____________________________________________________________________________________________________
conv2d_7 (Conv2D)                (None, 76, 76, 128)   32768       leaky_re_lu_6[0][0]              
____________________________________________________________________________________________________
batch_normalization_7 (BatchNorm (None, 76, 76, 128)   512         conv2d_7[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)        (None, 76, 76, 128)   0           batch_normalization_7[0][0]      
____________________________________________________________________________________________________
conv2d_8 (Conv2D)                (None, 76, 76, 256)   294912      leaky_re_lu_7[0][0]              
____________________________________________________________________________________________________
batch_normalization_8 (BatchNorm (None, 76, 76, 256)   1024        conv2d_8[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)        (None, 76, 76, 256)   0           batch_normalization_8[0][0]      
____________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)   (None, 38, 38, 256)   0           leaky_re_lu_8[0][0]              
____________________________________________________________________________________________________
conv2d_9 (Conv2D)                (None, 38, 38, 512)   1179648     max_pooling2d_4[0][0]            
____________________________________________________________________________________________________
batch_normalization_9 (BatchNorm (None, 38, 38, 512)   2048        conv2d_9[0][0]                   
____________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)        (None, 38, 38, 512)   0           batch_normalization_9[0][0]      
____________________________________________________________________________________________________
conv2d_10 (Conv2D)               (None, 38, 38, 256)   131072      leaky_re_lu_9[0][0]              
____________________________________________________________________________________________________
batch_normalization_10 (BatchNor (None, 38, 38, 256)   1024        conv2d_10[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)       (None, 38, 38, 256)   0           batch_normalization_10[0][0]     
____________________________________________________________________________________________________
conv2d_11 (Conv2D)               (None, 38, 38, 512)   1179648     leaky_re_lu_10[0][0]             
____________________________________________________________________________________________________
batch_normalization_11 (BatchNor (None, 38, 38, 512)   2048        conv2d_11[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)       (None, 38, 38, 512)   0           batch_normalization_11[0][0]     
____________________________________________________________________________________________________
conv2d_12 (Conv2D)               (None, 38, 38, 256)   131072      leaky_re_lu_11[0][0]             
____________________________________________________________________________________________________
batch_normalization_12 (BatchNor (None, 38, 38, 256)   1024        conv2d_12[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)       (None, 38, 38, 256)   0           batch_normalization_12[0][0]     
____________________________________________________________________________________________________
conv2d_13 (Conv2D)               (None, 38, 38, 512)   1179648     leaky_re_lu_12[0][0]             
____________________________________________________________________________________________________
batch_normalization_13 (BatchNor (None, 38, 38, 512)   2048        conv2d_13[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)       (None, 38, 38, 512)   0           batch_normalization_13[0][0]     
____________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)   (None, 19, 19, 512)   0           leaky_re_lu_13[0][0]             
____________________________________________________________________________________________________
conv2d_14 (Conv2D)               (None, 19, 19, 1024)  4718592     max_pooling2d_5[0][0]            
____________________________________________________________________________________________________
batch_normalization_14 (BatchNor (None, 19, 19, 1024)  4096        conv2d_14[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)       (None, 19, 19, 1024)  0           batch_normalization_14[0][0]     
____________________________________________________________________________________________________
conv2d_15 (Conv2D)               (None, 19, 19, 512)   524288      leaky_re_lu_14[0][0]             
____________________________________________________________________________________________________
batch_normalization_15 (BatchNor (None, 19, 19, 512)   2048        conv2d_15[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)       (None, 19, 19, 512)   0           batch_normalization_15[0][0]     
____________________________________________________________________________________________________
conv2d_16 (Conv2D)               (None, 19, 19, 1024)  4718592     leaky_re_lu_15[0][0]             
____________________________________________________________________________________________________
batch_normalization_16 (BatchNor (None, 19, 19, 1024)  4096        conv2d_16[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)       (None, 19, 19, 1024)  0           batch_normalization_16[0][0]     
____________________________________________________________________________________________________
conv2d_17 (Conv2D)               (None, 19, 19, 512)   524288      leaky_re_lu_16[0][0]             
____________________________________________________________________________________________________
batch_normalization_17 (BatchNor (None, 19, 19, 512)   2048        conv2d_17[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)       (None, 19, 19, 512)   0           batch_normalization_17[0][0]     
____________________________________________________________________________________________________
conv2d_18 (Conv2D)               (None, 19, 19, 1024)  4718592     leaky_re_lu_17[0][0]             
____________________________________________________________________________________________________
batch_normalization_18 (BatchNor (None, 19, 19, 1024)  4096        conv2d_18[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)       (None, 19, 19, 1024)  0           batch_normalization_18[0][0]     
____________________________________________________________________________________________________
conv2d_19 (Conv2D)               (None, 19, 19, 1024)  9437184     leaky_re_lu_18[0][0]             
____________________________________________________________________________________________________
batch_normalization_19 (BatchNor (None, 19, 19, 1024)  4096        conv2d_19[0][0]                  
____________________________________________________________________________________________________
conv2d_21 (Conv2D)               (None, 38, 38, 64)    32768       leaky_re_lu_13[0][0]             
____________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)       (None, 19, 19, 1024)  0           batch_normalization_19[0][0]     
____________________________________________________________________________________________________
batch_normalization_21 (BatchNor (None, 38, 38, 64)    256         conv2d_21[0][0]                  
____________________________________________________________________________________________________
conv2d_20 (Conv2D)               (None, 19, 19, 1024)  9437184     leaky_re_lu_19[0][0]             
____________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)       (None, 38, 38, 64)    0           batch_normalization_21[0][0]     
____________________________________________________________________________________________________
batch_normalization_20 (BatchNor (None, 19, 19, 1024)  4096        conv2d_20[0][0]                  
____________________________________________________________________________________________________
space_to_depth_x2 (Lambda)       (None, 19, 19, 256)   0           leaky_re_lu_21[0][0]             
____________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)       (None, 19, 19, 1024)  0           batch_normalization_20[0][0]     
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 19, 19, 1280)  0           space_to_depth_x2[0][0]          
                                                                   leaky_re_lu_20[0][0]             
____________________________________________________________________________________________________
conv2d_22 (Conv2D)               (None, 19, 19, 1024)  11796480    concatenate_1[0][0]              
____________________________________________________________________________________________________
batch_normalization_22 (BatchNor (None, 19, 19, 1024)  4096        conv2d_22[0][0]                  
____________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)       (None, 19, 19, 1024)  0           batch_normalization_22[0][0]     
____________________________________________________________________________________________________
conv2d_23 (Conv2D)               (None, 19, 19, 425)   435625      leaky_re_lu_22[0][0]             
====================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672
____________________________________________________________________________________________________

这个模型转换输入图像 ((m, 608, 608, 3)) 的一个预处理的批量到一个形状为 ((m, 19, 19, 5, 85)) 。

3.3 - 转换模型的输出到到可用的边框张量

yolo_model 的输出是一个 ((m, 19, 19, 5, 85)) 张量，这需要通过非平凡（non-trivial）的处理和转换。

关于 yolo_ahead 的实现，它定义在文件 ./yad2k/models/keras_yolo.py 文件中。

yolo_outputs = yolo_ahead(yolo_model.output, anchors, len(class_names))

增加 yolo_outputs 到计算图中。这4个张量集已经准备好作为 yolo_eval 函数的输入。

3.4 - 过滤边框

yolo_outpus 给出了所有的预测边框。现在需要执行过滤并选择的最好边框。调用前面实现的 yolo_eval 函数。

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

3.5 - 在图像上运行计算图

已经创建好了计算图并可以被描述为：

yolo_model.input 传入 yolo_model 。模型被用来计算输出 yolo_model.output 。
yolo_model.output 被 yolo_ahead 处理。输出 yolo_outpus 。
yolo_outpus 通过过滤函数 yolo_eval 。输出预测： scores 、 boxes 、 clases 。

编程实践：实现 predict() 其运行在计算图，使用YOLO测试一张图像。需要运行一个TensorFlow的session，到输出 scores 、 boxes 、 clases 。

image, image_data = preprocess_image("./images/" + image_file, model_image_size=(608, 608))

其输出为：

image ：用于绘制边框的图像的Python（PIL）表示。不会用到它。
image_data ：一个使用NumPy数组表示的图像。将会被传入到CNN的输入。

提示： 当模型使用 BatchNorm （YOLO就使用了）时，需要传入一个额外的占位符到 feed_dict {K.learning_phase():0} 。

前面使用了 K.get_session() 得到了一个TensorFlow的Session对象并将其存储到 sess 变量中。
为了计算张量列表，调用 sess.run() ：

sess.run(fetches=[tensor1,tensor2,tensor3],
         feed_dict={yolo_model.input: the_input_variable,
                    K.learning_phase():0
         }
)

变量 scores, boxes, classes 都没有传入 predict 函数，因为这些都是全局变量，可以在函数内直接使用。

def predict(sess, image_file):
    """
    Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the predictions.
    
    Arguments:
        sess -- your tensorflow/Keras session containing the YOLO graph
        image_file -- name of an image stored in the "images" folder.
    
    Returns:
        out_scores -- tensor of shape (None, ), scores of the predicted boxes
        out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
        out_classes -- tensor of shape (None, ), class index of the predicted boxes
    
    Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 
    """

    # Preprocess your image
    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

    # Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
    # You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
    ### START CODE HERE ### (≈ 1 line)
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})
    ### END CODE HERE ###

    # Print predictions info
    print('Found {} boxes for {}'.format(len(out_boxes), image_file))
    # Generate colors for drawing bounding boxes.
    colors = generate_colors(class_names)
    # Draw bounding boxes on the image file
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    # Save the predicted bounding box on the image
    image.save(os.path.join("out", image_file), quality=90)
    # Display the results in the notebook
    output_image = scipy.misc.imread(os.path.join("out", image_file))
    imshow(output_image)
    
    return out_scores, out_boxes, out_classes

在图片 test.jpg 上运行 predict 函数。

out_scores, out_boxes, out_classes = predict(sess, "test.jpg")

输出：

Found 7 boxes for test.jpg
car 0.60 (925, 285) (1045, 374)
car 0.66 (706, 279) (786, 350)
bus 0.67 (5, 266) (220, 407)
car 0.70 (947, 324) (1280, 705)
car 0.74 (159, 303) (346, 440)
car 0.80 (761, 282) (942, 412)
car 0.89 (367, 300) (745, 648)

predict

刚刚运行的模型可以检测到80个不同的类别，所有类别全部列举在 coco_classes.txt 文件中。

当驾驶在硅谷街道时，YOLO模型运行在拍摄的图像中的预测。

drive.ai 提供的数据集。

总结

YOLO是一个非常前沿的并且有着高准确率和快速的执行速度目标检测模型。
将输入图像通过传入CNN输出维度为 (19 imes 19 imes 5 imes 85) 。
编码可以被看作为每一个 (19 imes 19) 的网格包含的信息是关于5个锚框。
使用 non-max suppression 过滤边框。尤其是：
- 检测到的目标类别被概率的得分阈值过滤掉，仅剩下准确的边框。
- Intersection over Union（IOU）阈值用来限制重叠的边框。
因为从随机初始化的权重训练一个YOLO模型并非易事的而且需要大量的数据和计算力，本文使用之前训练好的模型参数。也可以在自己的数据集上进行微调（fine-tuning）YOLO模型。

参考

本文提出的想法主要来自于两篇YOLO的论文。本文的代码实现中很大一部分灵感和很多部分也都参考自在Allan Zelender的GitHub仓库。本文使用的预训练权重来源于YOLO的官方网站。本文是个人来自于Course的课程中翻译得到。

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi - You Only Look Once: Unified, Real-Time Object Detection （2015）
Joseph Redmon, Ali Farhadi - YOLO9000: Better, Faster, Stronger （2016）
Allan Zelener - YAD2K: Yet Another Darknet 2 Keras
YOLO官方网站： https://pjreddie.com/darknet/yolo/

查看全文

相关阅读:
maven学习讲解
 《Struts2.x权威指南》学习笔记2
《Struts2.x权威指南》学习笔记1
【转】Maven3把命令行创建的web工程转成Eclipse和IntelliJ Idea的工程
 [转]h5页面测试总结
 《零成本实现Web性能测试:基于Apache JMeter》读书笔记
 《软件性能测试过程详解与案例剖析》读书笔记
 手机屏幕尺寸测试——手机的实际显示页面的宽度
 web常识
 vue 生命周期

原文地址：https://www.cnblogs.com/geekfx/p/14230826.html