zoukankan      html  css  js  c++  java
  • Faster RCNN算法训练代码解析(3)

    四个层的forward函数分析:

    RoIDataLayer:读数据,随机打乱等

    AnchorTargetLayer:输出所有anchors(这里分析这个)

    ProposalLayer:用产生的anchors平移整图,裁剪出界、移除低于阈值的的anchors,排序后使用nms,返回顶部排名的anchors

    ProposalTargetLayer:将proposals分配给gt物体。得出proposal的分类标签和box的回归目标。

    紧接着之前的博客,我们继续来看faster rcnn中的AnchorTargetLayer层:

    class AnchorTargetLayer(caffe.Layer):
        """
        Assign anchors to ground-truth targets. Produces anchor classification
        labels and bounding-box regression targets.
        """
    
        def setup(self, bottom, top):
            layer_params = yaml.load(self.param_str_)
            anchor_scales = layer_params.get('scales', (8, 16, 32))
            self._anchors = generate_anchors(scales=np.array(anchor_scales))
            self._num_anchors = self._anchors.shape[0]
            self._feat_stride = layer_params['feat_stride']
    
            if DEBUG:
                print 'anchors:'
                print self._anchors
                print 'anchor shapes:'
                print np.hstack((
                    self._anchors[:, 2::4] - self._anchors[:, 0::4],
                    self._anchors[:, 3::4] - self._anchors[:, 1::4],
                ))
                self._counts = cfg.EPS
                self._sums = np.zeros((1, 4))
                self._squared_sums = np.zeros((1, 4))
                self._fg_sum = 0
                self._bg_sum = 0
                self._count = 0
    
            # allow boxes to sit over the edge by a small amount
            self._allowed_border = layer_params.get('allowed_border', 0)
    
            height, width = bottom[0].data.shape[-2:]
            if DEBUG:
                print 'AnchorTargetLayer: height', height, 'width', width
    
            A = self._num_anchors
            # labels
            top[0].reshape(1, 1, A * height, width)
            # bbox_targets
            top[1].reshape(1, A * 4, height, width)
            # bbox_inside_weights
            top[2].reshape(1, A * 4, height, width)
            # bbox_outside_weights
            top[3].reshape(1, A * 4, height, width)
    

    首先说一下这一层的目的是输出在特征图上所有点的anchors(经过二分类和回归)

    (1)输入blob:bottom[0]储存特征图信息,bottom[1]储存gt框坐标,bottom[2]储存im_info信息;

    (2)输出blob:top[0]存储anchors的label值(fg是1,bg是0,-1类不关心),top[1]存储的是生成的anchors的回归偏移量,即论文中的tx,ty,tw,th四个量(所以说整个faster rcnn总共两次bbox回归,第一次在RPN中,第二次在fast rcnn中),top[2]和top[3]分别存储的是bbox_inside_weights和bbox_outside_weights。

    好的,先进入层的setup函数:该函数通过解析父类对自己的一些参数进行初始化,同时定义该层的输入输出blob;

    该函数中要注意的是generate_anchors()函数,它的作用是产生对应与特征图上最左上角那个点的九种anchor(尺寸对应与输入图像),这9个anchor在后面被用来产生所有图像上的anchors,进入generate_anchors()函数。前面博客做过分析了,不再累述。

    接着向下看该层的前向传播函数forward函数:

        def forward(self, bottom, top):
            # Algorithm:
            #
            # for each (H, W) location i
            #   generate 9 anchor boxes centered on cell i
            #   apply predicted bbox deltas at cell i to each of the 9 anchors
            # filter out-of-image anchors
            # measure GT overlap
    
            assert bottom[0].data.shape[0] == 1, 
                'Only single item batches are supported'
    
            # map of shape (..., H, W)
            height, width = bottom[0].data.shape[-2:]   ##bottom[0]特征图信息,bottom[1]gt坐标,bottom[3]为im_info
            # GT boxes (x1, y1, x2, y2, label)
            gt_boxes = bottom[1].data
            # im_info
            im_info = bottom[2].data[0, :]
    
            if DEBUG:
                print ''
                print 'im_size: ({}, {})'.format(im_info[0], im_info[1])
                print 'scale: {}'.format(im_info[2])
                print 'height,  ({}, {})'.format(height, width)
                print 'rpn: gt_boxes.shape', gt_boxes.shape
                print 'rpn: gt_boxes', gt_boxes
    
            # 1. Generate proposals from bbox deltas and shifted anchors
            shift_x = np.arange(0, width) * self._feat_stride    ##映射原图的偏移量
            shift_y = np.arange(0, height) * self._feat_stride
            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                                shift_x.ravel(), shift_y.ravel())).transpose()
            # add A anchors (1, A, 4) to
            # cell K shifts (K, 1, 4) to get
            # shift anchors (K, A, 4)
            # reshape to (K*A, 4) shifted anchors
            A = self._num_anchors
            K = shifts.shape[0]
            all_anchors = (self._anchors.reshape((1, A, 4)) +
                           shifts.reshape((1, K, 4)).transpose((1, 0, 2)))   ##左上角anchor进行偏移覆盖全图
            all_anchors = all_anchors.reshape((K * A, 4))
            total_anchors = int(K * A)
    
            # only keep anchors inside the image ,保留位置在图像内的anchors
            inds_inside = np.where(
                (all_anchors[:, 0] >= -self._allowed_border) &
                (all_anchors[:, 1] >= -self._allowed_border) &
                (all_anchors[:, 2] < im_info[1] + self._allowed_border) &  # width
                (all_anchors[:, 3] < im_info[0] + self._allowed_border)    # height
            )[0]
    
            if DEBUG:
                print 'total_anchors', total_anchors
                print 'inds_inside', len(inds_inside)
    
            # keep only inside anchors
            anchors = all_anchors[inds_inside, :]
            if DEBUG:
                print 'anchors.shape', anchors.shape
    ########################################################################################################################
    ##这里的shift_x和shift_y分别对应x和y轴上的偏移量,用在之前说过的用generate_anchors()函数生成的最左上角的anchors上,
    ##对其进行偏移,从而获得所有图像上的anchors;all_anchors用来存储所有这些anchors,total_anchors用来存储这些anchors的数量K×A,其中,
    ##K是输入图像的num,A是一幅图像上anchor的num;之后作者还对这些anchors进行了筛选,超出图像边界的anchors都将其丢弃~继续:
    ##########################################################################################################################
    # label: 1 is positive, 0 is negative, -1 is dont care labels = np.empty((len(inds_inside), ), dtype=np.float32) labels.fill(-1) # overlaps between the anchors and the gt boxes # overlaps (ex, gt) overlaps = bbox_overlaps( np.ascontiguousarray(anchors, dtype=np.float), np.ascontiguousarray(gt_boxes, dtype=np.float)) ##n*k,重叠率 argmax_overlaps = overlaps.argmax(axis=1) max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] gt_argmax_overlaps = overlaps.argmax(axis=0) gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])] gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0] if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: # assign bg labels first so that positive labels can clobber them labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 # fg label: for each gt, anchor with highest overlap labels[gt_argmax_overlaps] = 1 # fg label: above threshold IOU labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 if cfg.TRAIN.RPN_CLOBBER_POSITIVES: # assign bg labels last so that negative labels can clobber positives labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #################################################################################################################
    ##这一部分主要就是获得这些anchors和对应gt的最大重叠率的情况,以及正样本的划分标准:a.对于每一个gt,重叠率最大的那个anchor为fg;
    ##b,对于每一个gt,最大重叠率大于0.7的为fg;
    #################################################################################################################
    # subsample positive labels if we have too many 正样本太多就采样 num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) fg_inds = np.where(labels == 1)[0] if len(fg_inds) > num_fg: disable_inds = npr.choice( fg_inds, size=(len(fg_inds) - num_fg), replace=False) labels[disable_inds] = -1 # subsample negative labels if we have too many num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) bg_inds = np.where(labels == 0)[0] if len(bg_inds) > num_bg: disable_inds = npr.choice( bg_inds, size=(len(bg_inds) - num_bg), replace=False) labels[disable_inds] = -1 #print "was %s inds, disabling %s, now %s inds" % ( #len(bg_inds), len(disable_inds), np.sum(labels == 0)) bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: # uniform weighting of examples (given non-uniform sampling) num_examples = np.sum(labels >= 0) positive_weights = np.ones((1, 4)) * 1.0 / num_examples negative_weights = np.ones((1, 4)) * 1.0 / num_examples else: assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) & (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT / np.sum(labels == 1)) negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) / np.sum(labels == 0)) bbox_outside_weights[labels == 1, :] = positive_weights bbox_outside_weights[labels == 0, :] = negative_weights if DEBUG: self._sums += bbox_targets[labels == 1, :].sum(axis=0) self._squared_sums += (bbox_targets[labels == 1, :] ** 2).sum(axis=0) self._counts += np.sum(labels == 1) means = self._sums / self._counts stds = np.sqrt(self._squared_sums / self._counts - means ** 2) print 'means:' print means print 'stdevs:' print stds # map up to original set of anchors
    ##这里则是通过_unmap()函数实现将之前在所有图像上产生的anchors都赋上label、bbox_targets、bbox_inside_weights、bbox_outside_weights属性
    labels = _unmap(labels, total_anchors, inds_inside, fill=-1) bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0) bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) if DEBUG: print 'rpn: max max_overlap', np.max(max_overlaps) print 'rpn: num_positive', np.sum(labels == 1) print 'rpn: num_negative', np.sum(labels == 0) self._fg_sum += np.sum(labels == 1) self._bg_sum += np.sum(labels == 0) self._count += 1 print 'rpn: num_positive avg', self._fg_sum / self._count print 'rpn: num_negative avg', self._bg_sum / self._count # labels labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2) labels = labels.reshape((1, 1, A * height, width)) top[0].reshape(*labels.shape) top[0].data[...] = labels # bbox_targets bbox_targets = bbox_targets .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) top[1].reshape(*bbox_targets.shape) top[1].data[...] = bbox_targets # bbox_inside_weights bbox_inside_weights = bbox_inside_weights .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) assert bbox_inside_weights.shape[2] == height assert bbox_inside_weights.shape[3] == width top[2].reshape(*bbox_inside_weights.shape) top[2].data[...] = bbox_inside_weights # bbox_outside_weights bbox_outside_weights = bbox_outside_weights .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) assert bbox_outside_weights.shape[2] == height assert bbox_outside_weights.shape[3] == width top[3].reshape(*bbox_outside_weights.shape) top[3].data[...] = bbox_outside_weights

    这一部分是生成bbox_targets、bbox_inside_weights、bbox_inside_weights;其中对于bbox_targets,它这里是调用了_compute_targets()函数,见:

    def _compute_targets(ex_rois, gt_rois):
        """Compute bounding-box regression targets for an image."""
    
        assert ex_rois.shape[0] == gt_rois.shape[0]
        assert ex_rois.shape[1] == 4
        assert gt_rois.shape[1] == 5
    
        return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)

    在该函数又接着调用了bbox_transform函数,见:

    def bbox_transform(ex_rois, gt_rois):
        ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
        ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
        ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
        ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
    
        gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
        gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
        gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
        gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
    
        targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
        targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
        targets_dw = np.log(gt_widths / ex_widths)
        targets_dh = np.log(gt_heights / ex_heights)
    
        targets = np.vstack(
            (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
        return targets

    从而得到了论文中所需要的四个偏移量tx,ty,tw,th四个量;

    而对于后两个bbox_inside_weights和bbox_outside_weights,函数中定义的是bbox_inside_weights初始化为n×4的0数组,然后其中正样本的坐标的权值均为1;而bbox_outside_weights同样的初始化,其中正样本和负样本都被赋值1/num(anchors的数量)。

  • 相关阅读:
    使用 requests 维持会话
    使用 requests 发送 POST 请求
    使用 requests 发送 GET 请求
    requests 安装
    使用 urllib 分析 Robots 协议
    使用 urllib 解析 URL 链接
    使用 urllib 处理 HTTP 异常
    使用 urllib 处理 Cookies 信息
    使用 urllib 设置代理服务
    按单生产程序发布
  • 原文地址:https://www.cnblogs.com/hotsnow/p/9918073.html
Copyright © 2011-2022 走看看