zoukankan      html  css  js  c++  java
  • Single Shot Multibox Detection (SSD)实战(下)

    Single Shot Multibox Detection (SSD)实战(下)

     2. Training

    将逐步解释如何训练SSD模型进行目标检测。

    2.1. Data Reading and Initialization

    创建的Pikachu数据集。

    batch_size = 32

    train_iter, _ = d2l.load_data_pikachu(batch_size)

    Pikachu数据集中有1个类别。在定义模块之后,我们需要初始化模型参数并定义优化算法。

    ctx, net = d2l.try_gpu(), TinySSD(num_classes=1)

    net.initialize(init=init.Xavier(), ctx=ctx)

    trainer = gluon.Trainer(net.collect_params(), 'sgd',

                            {'learning_rate': 0.2, 'wd': 5e-4})

    2.2. Defining Loss and Evaluation Functions

    目标检测有两种损失。一是锚箱类损失。为此,我们可以简单地重用我们在图像分类中使用的交叉熵损失函数。第二个损失是正锚箱偏移损失。偏移量预测是一个规范化问题。但是,在这里,我们没有使用前面介绍的平方损失。相反,我们使用L1范数损失,即预测值与地面真实值之差的绝对值。mask变量bbox_masks从损失计算中删除负锚定框和填充锚定框。最后,我们加入锚箱类别和补偿损失,以找到模型的最终损失函数。

    cls_loss = gluon.loss.SoftmaxCrossEntropyLoss()

    bbox_loss = gluon.loss.L1Loss()

     

    def calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels, bbox_masks):

        cls = cls_loss(cls_preds, cls_labels)

        bbox = bbox_loss(bbox_preds * bbox_masks, bbox_labels * bbox_masks)

    return cls + bbox

    我们可以用准确率来评价分类结果。当我们使用L1范数损失,我们将使用平均绝对误差来评估包围盒预测结果。

    def cls_eval(cls_preds, cls_labels):

        # Because the category prediction results are placed in the final

        # dimension, argmax must specify this dimension

        return float((cls_preds.argmax(axis=-1) == cls_labels).sum())

    def bbox_eval(bbox_preds, bbox_labels, bbox_masks):

        return float((np.abs((bbox_labels - bbox_preds) * bbox_masks)).sum())

    2.3. Training the Model

    在模型训练过程中,我们必须在模型的正向计算过程中生成多尺度锚盒(anchors),并预测每个锚盒的类别(cls_preds)和偏移量(bbox_preds)。然后,我们根据标签信息Y标记每个锚定框的类别(cls_labels)和偏移量(bbox_labels)。最后,我们使用预测和标记的类别和偏移量值计算损失函数。为了简化代码,这里不计算训练数据集。

    num_epochs, timer = 20, d2l.Timer()

    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],

                            legend=['class error', 'bbox mae'])

    for epoch in range(num_epochs):

        # accuracy_sum, mae_sum, num_examples, num_labels

        metric = d2l.Accumulator(4)

        train_iter.reset()  # Read data from the start.

        for batch in train_iter:

            timer.start()

            X = batch.data[0].as_in_ctx(ctx)

            Y = batch.label[0].as_in_ctx(ctx)

            with autograd.record():

                # Generate multiscale anchor boxes and predict the category and

                # offset of each

                anchors, cls_preds, bbox_preds = net(X)

                # Label the category and offset of each anchor box

                bbox_labels, bbox_masks, cls_labels = npx.multibox_target(

                    anchors, Y, cls_preds.transpose(0, 2, 1))

                # Calculate the loss function using the predicted and labeled

                # category and offset values

                l = calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels,

                              bbox_masks)

            l.backward()

            trainer.step(batch_size)

            metric.add(cls_eval(cls_preds, cls_labels), cls_labels.size,

                       bbox_eval(bbox_preds, bbox_labels, bbox_masks),

                       bbox_labels.size)

        cls_err, bbox_mae = 1-metric[0]/metric[1], metric[2]/metric[3]

        animator.add(epoch+1, (cls_err, bbox_mae))

    print('class err %.2e, bbox mae %.2e' % (cls_err, bbox_mae))

    print('%.1f examples/sec on %s' % (train_iter.num_image/timer.stop(), ctx))

    class err 2.35e-03, bbox mae 2.68e-03

    4315.5 examples/sec on gpu(0)

    3. Prediction
    在预测阶段,我们要检测图像中所有感兴趣的对象。下面,我们读取测试图像并转换其大小。然后,我们将其转换为卷积层所需的四维格式。

    img = image.imread('../img/pikachu.jpg')

    feature = image.imresize(img, 256, 256).astype('float32')

    X = np.expand_dims(feature.transpose(2, 0, 1), axis=0)

    利用MultiBoxDetection函数,我们根据锚定框及其预测的偏移量来预测边界框。然后,我们使用非最大值抑制来移除类似的边界框。

    def predict(X):

        anchors, cls_preds, bbox_preds = net(X.as_in_ctx(ctx))

        cls_probs = npx.softmax(cls_preds).transpose(0, 2, 1)

        output = npx.multibox_detection(cls_probs, bbox_preds, anchors)

        idx = [i for i, row in enumerate(output[0]) if row[0] != -1]

        return output[0, idx]

    output = predict(X)

    最后,我们取置信度至少为0.3的所有边界框,并将它们显示为最终输出。

    def display(img, output, threshold):

        d2l.set_figsize((5, 5))

        fig = d2l.plt.imshow(img.asnumpy())

        for row in output:

            score = float(row[1])

            if score < threshold:

                continue

            h, w = img.shape[0:2]

            bbox = [row[2:6] * np.array((w, h, w, h), ctx=row.ctx)]

            d2l.show_bboxes(fig.axes, bbox, '%.2f' % score, 'w')

     

    display(img, output, threshold=0.3)

    4. Loss Function

    由于空间的限制,我们在本实验中忽略了SSD模型的一些实现细节。您能否在以下方面进一步改进该模型?

    For the predicted offsets, replace L1L1 norm loss with L1L1 regularization loss. This loss function uses a square function around zero for greater smoothness. This is the regularized area controlled by the hyperparameter σσ:

    When σσ is large, this loss is similar to the L1L1 norm loss. When the value is small, the loss function is smoother.

    sigmas = [10, 1, 0.5]

    lines = ['-', '--', '-.']

    x = np.arange(-2, 2, 0.1)

    d2l.set_figsize()

    for l, s in zip(lines, sigmas):

        y = npx.smooth_l1(x, scalar=s)

        d2l.plt.plot(x.asnumpy(), y.asnumpy(), l, label='sigma=%.1f' % s)

    d2l.plt.legend

    def focal_loss(gamma, x):

        return -(1 - x) ** gamma * np.log(x)

    x = np.arange(0.01, 1, 0.01)

    for l, gamma in zip(lines, [0, 1, 5]):

        y = d2l.plt.plot(x.asnumpy(), focal_loss(gamma, x).asnumpy(), l,

                         label='gamma=%.1f' % gamma)

    d2l.plt.legend();

    Training and Prediction

    When an object is relatively large compared to the image, the model normally adopts a larger input image size.

    This generally produces a large number of negative anchor boxes when labeling anchor box categories. We can sample the negative anchor boxes to better balance the data categories. To do this, we can set the MultiBoxTarget function’s negative_mining_ratio parameter.

    Assign hyper-parameters with different weights to the anchor box category loss and positive anchor box offset loss in the loss function.

    Refer to the SSD paper. What methods can be used to evaluate the precision of object detection models?

    5. Summary

    • SSD is a multiscale object detection model. This model generates different numbers of anchor boxes of different sizes based on the base network block and each multiscale feature block and predicts the categories and offsets of the anchor boxes to detect objects of different sizes.
    • During SSD model training, the loss function is calculated using the predicted and labeled category and offset values.

     

  • 相关阅读:
    菜鸟级别学习
    BootSrap学习
    将一正整数序列{K1,K2,...,K9}重新排列成一个新的序列。新序列中,比K1小的数都在K1的前面(左面),比K1大的数都在K1的后面(右面)。
    按递增顺序依次列出所有分母为40,分子小于40的最简分数。
    对于输入的每个字符串,查找其中的最大字母,在该字母后面插入字符串"(max)"。
    有两个整数,如果每个整数的约数和(除了它本身以外)等于对方,我们就称这对数是友好的。
    创建一个带头结点的单链表,逆置链表,在单链表中删除值相同的多余结点,并遍历链表,删除链表最大节点。
    从键盘输入一个字符串,按照字符顺序从小到大进行选择排序,并要求删除重复的字符
    数组的逆置
    求亲密数
  • 原文地址:https://www.cnblogs.com/wujianming-110117/p/13214350.html
Copyright © 2011-2022 走看看