zoukankan      html  css  js  c++  java
  • [学习笔记] SSD代码笔记 + EifficientNet backbone 练习

    SSD代码笔记 + EifficientNet backbone 练习

    ssd代码完全ok了,然后用最近性能和速度都非常牛的Eifficient Net做backbone设计了自己的TinySSD网络,没有去调参,所以网络并没有很好的收敛,之后我会调一调,实际去应用。

    torch.clamp

    torch.clamp(input, min, max, out=None) → Tensor

    就是clip的功能

    eg:

    >>> a = torch.randn(4)
    >>> a
    tensor([-1.7120,  0.1734, -0.0478, -0.0922])
    >>> torch.clamp(a, min=-0.5, max=0.5)
    tensor([-0.5000,  0.1734, -0.0478, -0.0922])
    

    计算iou

    交集除以并集,首先要算面积,面积就是给定两个点坐标求宽高乘积即可。

    交集面积就是两个框离原点最远的左上角点与离原点最近的右下角点组成区域的面积。

    def area_of(left_top,right_bottom):# (num_boxes,2),(num_boxes,2)
    	hw = torch.clamp(right_bottom-left_top,0.0) # (num_boxes,2)
        # 这里做clip的原因是如果框不重叠的化,如果不clip算出来就是负值,有了clip就是0
        return hw[...,0] * hw[...,1]
    
    def iou_of(boxes0,boxes1,eps = 1e-5):# (N,4) and (1,4) or (N,4)
        # 注意这里其实boxes0和boxes1和size其实是不一样的,所以size较少的那个会broadcast到较大的那个然后在做max和min操作。
        overlap_left_top = torch.max(boxes0[...,:2],boxes1[...:2])# 左上角点最远的
        overlap_right_bottom = torch.min(boxes0[...,2:],boxes1[...,2:]) # 右下角点最近的
        area0 = area_of(boxes0[...,:2],boxes0[...,2:]) # 左上角和右下角点计算面积
        area1 = area_of(boxes1[...,:2],boxes1[...,2:]) # predict box的面积
        overlap_area = area_of(overlap_left_top,over_right_bottom) # 并集面积
        return overlap_area / (area0 + area1 - overlap_area + eps)
        
    

    生成priorbox

    这里生成坐标的方法是真学到了,product构造映射。实际上featur map上每个点都是加了0.5作为中心,然后除以ratio,这里ratio一般和feature map 的size是不一样的,取决于网络设计,我这里用的刚好一样。ratio就是对应到原图有多少个滑窗。后面考虑了其他尺度的anchor。

    #有的注释是原来代码里的,有的英文注释是我加的。
    class PriorBox(nn.Module):
        def __init__(self):
            super(PriorBox, self).__init__()
            self.image_size = 512
            self.feature_maps = [16,8,4,2,1]
            self.min_sizes = [30,60,111,162,213]
            self.max_sizes = [60,111,162,213,512]
            self.strides = [32,64,128,256,512]
            self.aspect_ratios = [[2], [2, 3], [2, 3], [2], [2]]
            self.clip = True
    
        def forward(self):
            """Generate SSD Prior Boxes.
                It returns the center, height and width of the priors. The values are relative to the image size
                Returns:
                    priors (num_priors, 4): The prior boxes represented as [[center_x, center_y, w, h]]. All the values
                        are relative to the image size.
            """
            priors = []
            for k, f in enumerate(self.feature_maps): # every size of feature map
                scale = self.image_size / self.strides[k] # how many boxes (not anchor) in a row in raw img
                # 512 / 32 = 16
                for i, j in product(range(f), repeat=2): # xy generator in feature map
                    # unit center x,y
                    cx = (j + 0.5) / scale # see as blocks and xy in center of it 
                    cy = (i + 0.5) / scale # 15,15 -> 15.5,15.5 -> 15.5/16,15.5/16 which means the xy in center of feature map
    
                    # small sized square box
                    size = self.min_sizes[k] # min size
                    h = w = size / self.image_size # small size
                    priors.append([cx, cy, w, h]) # the small size one
    
                    # big sized square box
                    size = sqrt(self.min_sizes[k] * self.max_sizes[k]) # the same as small one
                    h = w = size / self.image_size
                    priors.append([cx, cy, w, h])
    
                    # change h/w ratio of the small sized box
                    # considering the w/ratio , w*ratio , h/ratio and h * ratio
                    size = self.min_sizes[k]
                    h = w = size / self.image_size
                    for ratio in self.aspect_ratios[k]:
                        ratio = sqrt(ratio)
                        priors.append([cx, cy, w * ratio, h / ratio])
                        priors.append([cx, cy, w / ratio, h * ratio])
    
            priors = torch.Tensor(priors)
            if self.clip:
                priors.clamp_(max=1, min=0)
            return priors
    

    priorbox的分配

    很好的利用了broadcast机制,计算每个iou,然后得到target与所有prior重叠度最高的匹配,以及prior与target重叠度最高的匹配,然后通过阈值滤去。

    def assign_priors(gt_boxes, gt_labels, corner_form_priors,
                      iou_threshold):
        """Assign ground truth boxes and targets to priors.
    
        Args:
            gt_boxes (num_targets, 4): ground truth boxes.
            gt_labels (num_targets): labels of targets.
            priors (num_priors, 4): corner form priors
        Returns:
            boxes (num_priors, 4): real values for priors.
            labels (num_priros): labels for priors.
        """
        # size: num_priors x num_targets
        ious = iou_of(gt_boxes.unsqueeze(0), corner_form_priors.unsqueeze(1))
        # size: num_priors
        best_target_per_prior, best_target_per_prior_index = ious.max(1) # 每个prior的iou最大的值以及在target里的索引
        # size: num_targets
        best_prior_per_target, best_prior_per_target_index = ious.max(0) # 每个target与所有prior的iou最大值以及在priors里的索引
    
        for target_index, prior_index in enumerate(best_prior_per_target_index):
            best_target_per_prior_index[prior_index] = target_index # 让每个Prior对应iou最大的target (0,0,1,2,3)
        # 2.0 is used to make sure every target has a prior assigned
        best_target_per_prior.index_fill_(0, best_prior_per_target_index, 2) # dim = 0 ,value = 2,只要重叠的iou最大,就认为其重叠度是2
        # size: num_priors
        labels = gt_labels[best_target_per_prior_index] # num_priors,先按照iou最大分
        labels[best_target_per_prior < iou_threshold] = 0  # the backgournd id,小于阈值的认为是背景,有的iou尽管最大但是其iou还是很小,所以也需要滤去
        boxes = gt_boxes[best_target_per_prior_index] # 直接给box 
        return boxes, labels
    

    hard_negative_mining

    通过给出mask考虑算哪些loss不算哪些loss,因为负样本实在太多了,所以这是一个方法。

    def hard_negative_mining(loss, labels, neg_pos_ratio):
        """
        It used to suppress the presence of a large number of negative prediction.
        It works on image level not batch level.
        For any example/image, it keeps all the positive predictions and
         cut the number of negative predictions to make sure the ratio
         between the negative examples and positive examples is no more
         the given ratio for an image.
    
        Args:
            loss (N, num_priors): the loss for each example.
            labels (N, num_priors): the labels.
            neg_pos_ratio:  the ratio between the negative examples and positive examples.
        """
        pos_mask = labels > 0
        num_pos = pos_mask.long().sum(dim=1, keepdim=True)
        num_neg = num_pos * neg_pos_ratio
    
        loss[pos_mask] = -math.inf
        _, indexes = loss.sort(dim=1, descending=True)
        _, orders = indexes.sort(dim=1)
        neg_mask = orders < num_neg
        return pos_mask | neg_mask
    

    Loss Function

    bbox用smotth L1 loss,交叉熵分类loss。

    class MultiBoxLoss(nn.Module):
        def __init__(self, neg_pos_ratio):
            """Implement SSD MultiBox Loss.
    
            Basically, MultiBox loss combines classification loss
             and Smooth L1 regression loss.
            """
            super(MultiBoxLoss, self).__init__()
            self.neg_pos_ratio = neg_pos_ratio
    
        def forward(self, confidence, predicted_locations, labels, gt_locations):
            """Compute classification loss and smooth l1 loss.
    
            Args:
                confidence (batch_size, num_priors, num_classes): class predictions.
                predicted_locations (batch_size, num_priors, 4): predicted locations.
                labels (batch_size, num_priors): real labels of all the priors.
                gt_locations (batch_size, num_priors, 4): real boxes corresponding all the priors.
            """
            num_classes = confidence.size(2)
            with torch.no_grad():
                # derived from cross_entropy=sum(log(p))
                loss = -F.log_softmax(confidence, dim=2)[:, :, 0]
                mask = box_utils.hard_negative_mining(loss, labels, self.neg_pos_ratio)
    
            confidence = confidence[mask, :]
            #print(confidence.view(-1, num_classes))
            #print(labels[mask])
            classification_loss = F.cross_entropy(confidence.view(-1, num_classes), labels[mask], reduction='sum')
    
            pos_mask = labels > 0
            predicted_locations = predicted_locations[pos_mask, :].view(-1, 4)
            gt_locations = gt_locations[pos_mask, :].view(-1, 4)
            smooth_l1_loss = F.smooth_l1_loss(predicted_locations, gt_locations, reduction='sum')
            num_pos = gt_locations.size(0)
            return smooth_l1_loss / num_pos, classification_loss / num_pos
    

    Model

    我看网络模型搜索得到的Eifficient Net性能和速度都是最优,直接拿来做backbone,但是调参还没调好,只是直接用其输出然后再加5层卷积层分别做特征金字塔,感觉感受野可能太大了,网络收敛性能不是很好,后面会调好参的,但是还是可以跑的。

    使用EFnet作为后端的训练效果:

    Eifficient Net Model

    import torch
    from torch import nn
    from torch.nn import functional as F
    
    from .utils import (
        relu_fn,
        round_filters,
        round_repeats,
        drop_connect,
        Conv2dSamePadding,
        get_model_params,
        efficientnet_params,
        load_pretrained_weights,
    )
    
    class MBConvBlock(nn.Module):
        """
        Mobile Inverted Residual Bottleneck Block
    
        Args:
            block_args (namedtuple): BlockArgs, see above
            global_params (namedtuple): GlobalParam, see above
    
        Attributes:
            has_se (bool): Whether the block contains a Squeeze and Excitation layer.
        """
    
        def __init__(self, block_args, global_params):
            super().__init__()
            self._block_args = block_args
            self._bn_mom = 1 - global_params.batch_norm_momentum
            self._bn_eps = global_params.batch_norm_epsilon
            self.has_se = (self._block_args.se_ratio is not None) and (0 < self._block_args.se_ratio <= 1)
            self.id_skip = block_args.id_skip  # skip connection and drop connect
    
            # Expansion phase
            inp = self._block_args.input_filters  # number of input channels
            oup = self._block_args.input_filters * self._block_args.expand_ratio  # number of output channels
            if self._block_args.expand_ratio != 1:
                self._expand_conv = Conv2dSamePadding(in_channels=inp, out_channels=oup, kernel_size=1, bias=False)
                self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
    
            # Depthwise convolution phase
            k = self._block_args.kernel_size
            s = self._block_args.stride
            self._depthwise_conv = Conv2dSamePadding(
                in_channels=oup, out_channels=oup, groups=oup,  # groups makes it depthwise
                kernel_size=k, stride=s, bias=False)
            self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
    
            # Squeeze and Excitation layer, if desired
            if self.has_se:
                num_squeezed_channels = max(1, int(self._block_args.input_filters * self._block_args.se_ratio))
                self._se_reduce = Conv2dSamePadding(in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1)
                self._se_expand = Conv2dSamePadding(in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1)
    
            # Output phase
            final_oup = self._block_args.output_filters
            self._project_conv = Conv2dSamePadding(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False)
            self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._bn_mom, eps=self._bn_eps)
    
        def forward(self, inputs, drop_connect_rate=None):
            """
            :param inputs: input tensor
            :param drop_connect_rate: drop connect rate (float, between 0 and 1)
            :return: output of block
            """
    
            # Expansion and Depthwise Convolution
            x = inputs
            if self._block_args.expand_ratio != 1:
                x = relu_fn(self._bn0(self._expand_conv(inputs)))
            x = relu_fn(self._bn1(self._depthwise_conv(x)))
    
            # Squeeze and Excitation
            if self.has_se:
                x_squeezed = F.adaptive_avg_pool2d(x, 1)
                x_squeezed = self._se_expand(relu_fn(self._se_reduce(x_squeezed)))
                x = torch.sigmoid(x_squeezed) * x
    
            x = self._bn2(self._project_conv(x))
    
            # Skip connection and drop connect
            input_filters, output_filters = self._block_args.input_filters, self._block_args.output_filters
            if self.id_skip and self._block_args.stride == 1 and input_filters == output_filters:
                if drop_connect_rate:
                    x = drop_connect(x, p=drop_connect_rate, training=self.training)
                x = x + inputs  # skip connection
            return x
    
    
    class EfficientNet(nn.Module):
        """
        An EfficientNet model. Most easily loaded with the .from_name or .from_pretrained methods
    
        Args:
            blocks_args (list): A list of BlockArgs to construct blocks
            global_params (namedtuple): A set of GlobalParams shared between blocks
    
        Example:
            model = EfficientNet.from_pretrained('efficientnet-b0')
    
        """
    
        def __init__(self, blocks_args=None, global_params=None):
            super().__init__()
            assert isinstance(blocks_args, list), 'blocks_args should be a list'
            assert len(blocks_args) > 0, 'block args must be greater than 0'
            self._global_params = global_params
            self._blocks_args = blocks_args
    
            # Batch norm parameters
            bn_mom = 1 - self._global_params.batch_norm_momentum
            bn_eps = self._global_params.batch_norm_epsilon
    
            # Stem
            in_channels = 3  # rgb
            out_channels = round_filters(32, self._global_params)  # number of output channels
            self._conv_stem = Conv2dSamePadding(in_channels, out_channels, kernel_size=3, stride=2, bias=False)
            self._bn0 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)
    
            # Build blocks
            self._blocks = nn.ModuleList([])
            for block_args in self._blocks_args:
    
                # Update block input and output filters based on depth multiplier.
                block_args = block_args._replace(
                    input_filters=round_filters(block_args.input_filters, self._global_params),
                    output_filters=round_filters(block_args.output_filters, self._global_params),
                    num_repeat=round_repeats(block_args.num_repeat, self._global_params)
                )
    
                # The first block needs to take care of stride and filter size increase.
                self._blocks.append(MBConvBlock(block_args, self._global_params))
                if block_args.num_repeat > 1:
                    block_args = block_args._replace(input_filters=block_args.output_filters, stride=1)
                for _ in range(block_args.num_repeat - 1):
                    self._blocks.append(MBConvBlock(block_args, self._global_params))
    
            # Head
            in_channels = block_args.output_filters  # output of final block
            out_channels = round_filters(1280, self._global_params)
            self._conv_head = Conv2dSamePadding(in_channels, out_channels, kernel_size=1, bias=False)
            self._bn1 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)
    
            # Final linear layer
            self._dropout = self._global_params.dropout_rate
            self._fc = nn.Linear(out_channels, self._global_params.num_classes)
    
        def extract_features(self, inputs):
            """ Returns output of the final convolution layer """
    
            # Stem
            x = relu_fn(self._bn0(self._conv_stem(inputs)))
    
            # Blocks
            for idx, block in enumerate(self._blocks):
                drop_connect_rate = self._global_params.drop_connect_rate
                if drop_connect_rate:
                    drop_connect_rate *= float(idx) / len(self._blocks)
                x = block(x) # , drop_connect_rate) # see https://github.com/tensorflow/tpu/issues/381
    
            return x
    
        def forward(self, inputs):
            """ Calls extract_features to extract features, applies final linear layer, and returns logits. """
    
            # Convolution layers
            x = self.extract_features(inputs)
    
            # Head
            x = relu_fn(self._bn1(self._conv_head(x)))
            x = F.adaptive_avg_pool2d(x, 1).squeeze(-1).squeeze(-1)
            if self._dropout:
                x = F.dropout(x, p=self._dropout, training=self.training)
            x = self._fc(x)
            return x
    
        @classmethod
        def from_name(cls, model_name, override_params=None):
            cls._check_model_name_is_valid(model_name)
            blocks_args, global_params = get_model_params(model_name, override_params)
            return EfficientNet(blocks_args, global_params)
    
        @classmethod
        def from_pretrained(cls, model_name):
            model = EfficientNet.from_name(model_name)
            load_pretrained_weights(model, model_name)
            return model
    
        @classmethod
        def get_image_size(cls, model_name):
            cls._check_model_name_is_valid(model_name)
            _, _, res, _ = efficientnet_params(model_name)
            return res
    
        @classmethod
        def _check_model_name_is_valid(cls, model_name, also_need_pretrained_weights=False):
            """ Validates model name. None that pretrained weights are only available for
            the first four models (efficientnet-b{i} for i in 0,1,2,3) at the moment. """
            num_models = 4 if also_need_pretrained_weights else 8
            valid_models = ['efficientnet_b'+str(i) for i in range(num_models)]
            if model_name.replace('-','_') not in valid_models:
                raise ValueError('model_name should be one of: ' + ', '.join(valid_models))
    
    

    My TinySSD Model

    '''
    @Descripttion: This is Aoru Xue's demo,which is only for reference
    @version: 
    @Author: Aoru Xue
    @Date: 2019-06-14 00:42:10
    @LastEditors: Aoru Xue
    @LastEditTime: 2019-09-02 17:04:26
    '''
    
    
    import torch
    from torch import nn
    from efficientnet_pytorch import EfficientNet
    from prior_box import PriorBox
    from torchsummary import summary
    import torch.nn.functional as F
    from box_utils import *
    from PIL import Image
    class TinySSD(nn.Module):
        def __init__(self,training = True):
            super(TinySSD,self).__init__()
            self.basenet = EfficientNet.from_name('efficientnet-b0')
            self.training = training
            for idx,num_anchors in enumerate([4, 6, 6, 4, 4]):
                setattr(self,"predict_bbox_{}".format(idx + 1),nn.Conv2d(
                    320,num_anchors * 4,kernel_size = 3,padding = 1
                ))
                setattr(self,"predict_class_{}".format(idx + 1),nn.Conv2d( # 这里3 是 2 + 1
                    320,3 * num_anchors,kernel_size = 3,padding = 1
                ))
            self.priors = None
            for idx,k in enumerate([[320,320],[320,320],[320,320]]):
                setattr(self,"feature_{}".format(idx + 2),nn.Sequential(
                    nn.Conv2d(k[0],k[1],kernel_size = 3,padding =1),
                    nn.BatchNorm2d(k[1]),
                    nn.ReLU(),
                    nn.Conv2d(k[1],k[1],kernel_size = 3,padding =1),
                    nn.BatchNorm2d(k[1]),
                    nn.ReLU(),
                    nn.MaxPool2d(2)
                ))
        def forward(self,x):
            x = self.basenet.extract_features(x)
            feature_1 = x
            feature_2 = self.feature_2(x)
            feature_3 = self.feature_3(feature_2)
            feature_4 = self.feature_4(feature_3)
            feature_5 = F.max_pool2d(feature_4,kernel_size = 2)
            
            
            '''
            (2,4*4,16,16)
            (2,4*6,8,8)
            (2,4*6,4,4),
            (2,4*4,2,2),
            (2,4*4,1,1)
    
            -> 每个 anchor 中心,连续4个值代表x y w h
            '''
            confidences = []
            locations = []
            locations.append(self.predict_bbox_1(feature_1).permute(0,2,3,1).contiguous())
            locations.append(self.predict_bbox_2(feature_2).permute(0,2,3,1).contiguous())
            locations.append(self.predict_bbox_3(feature_3).permute(0,2,3,1).contiguous())
            locations.append(self.predict_bbox_4(feature_4).permute(0,2,3,1).contiguous())
            locations.append(self.predict_bbox_5(feature_5).permute(0,2,3,1).contiguous())
            locations = torch.cat([o.view(o.size(0), -1) for o in locations], 1) #(batch_size,total_anchor_num*4)
            locations = locations.view(locations.size(0), -1, 4) # (batch_size,total_anchor_num,4)
    
            confidences.append(self.predict_class_1(feature_1).permute(0,2,3,1).contiguous())
            confidences.append(self.predict_class_2(feature_2).permute(0,2,3,1).contiguous())
            confidences.append(self.predict_class_3(feature_3).permute(0,2,3,1).contiguous())
            confidences.append(self.predict_class_4(feature_4).permute(0,2,3,1).contiguous())
            confidences.append(self.predict_class_5(feature_5).permute(0,2,3,1).contiguous())
            confidences = torch.cat([o.view(o.size(0), -1) for o in confidences], 1) #(batch_size,total_anchor_num*4)
            confidences = confidences.view(confidences.size(0), -1, 3) # (batch_size,total_anchor_num,4)
            if not self.training:
                if self.priors is None:
                    self.priors = PriorBox()()
                    self.priors = self.priors.cuda()
                boxes = convert_locations_to_boxes(
                    locations, self.priors, 0.1, 0.2
                )
                confidences = F.softmax(confidences, dim=2)
                return confidences, boxes
            else:
                #print(confidences.size(),locations.size())
                return (confidences, locations) #  (2,1111,3) (2,1111,4)
            
    if __name__ == "__main__":
        net = TinySSD()
        net.cuda()
        #prior = PriorBox()
        #print(len(prior()))
        #gt_prior = assign_priors(torch.Tensor([[0,0,10/512,10/512],[55/512,55/512,30/512,30/512]]),torch.Tensor([1,2,5]),prior(),0.5)
        #print(gt_prior[1])
        #x = torch.randn(1,3,512,512)
        #out = net(x.cuda())
        #print(out[0].size())
        #print(out[1].size())
        #print(prior()[:200,:])
        #print(out[0][0])
        #print(out[1][0])
        summary(net,(3,512,512),device="cuda")
    
        
    
    

    dataset

    '''
    @Descripttion: This is Aoru Xue's demo,which is only for reference
    @version: 
    @Author: Aoru Xue
    @Date: 2019-06-15 12:48:09
    @LastEditors: Aoru Xue
    @LastEditTime: 2019-09-13 10:43:34
    '''
    import torch
    import torch.nn
    from torch.utils.data import Dataset
    from PIL import Image
    from prior_box import PriorBox
    from box_utils import *
    import cv2 as cv
    import random
    import numpy as np
    import glob
    import xml.etree.ElementTree as ET
    
    class Mydataset(Dataset):
        def __init__(self,img_path = "./dataset",transform = None,center_variance = 0.1,size_variance = 0.2):
            self.center_variance = center_variance
            self.size_variance = size_variance
            self.img_paths = glob.glob(img_path + "/images/*.jpg")
            self.labels = [label.replace(".jpg",".xml").replace("images","labels") for label in self.img_paths]
            self.class_names = ("__background__","basketball","volleyball")
            prior = PriorBox() 
            self.center_form_priors = prior() # center form
            self.imgW,self.imgH = 512,512
            self.corner_form_priors = center_form_to_corner_form(self.center_form_priors)
            #print(self.center_form_priors.size(),self.corner_form_priors.size())
            self.transform = transform
        def __len__(self):
            return len(self.img_paths)
        def __getitem__(self,idx):
            img = Image.open(self.img_paths[idx]).convert("RGB")
            label_file = self.labels[idx]
            gt_bboxes,gt_classes = self._get_annotation(idx)
            
            if self.transform:
                img = self.transform(img)
          
    
            gt_bboxes,gt_classes = assign_priors(gt_bboxes,gt_classes,self.corner_form_priors,0.5) # corner form
            #imH,imW = cv_img.shape[:2]
            
            gt_bboxes = corner_form_to_center_form(gt_bboxes) # (1524, 4) center form
            locations = convert_boxes_to_locations(gt_bboxes, self.center_form_priors, self.center_variance, self.size_variance) # 相当于归一化
            # 拟合距离而不是直接拟合,这样更容易拟合。
            return [img,locations,gt_classes]
        def _get_annotation(self,idx):
            annotation_file = self.labels[idx]
            objects = ET.parse(annotation_file).findall("object")
            boxes = []
            labels = []
            #is_difficult = []
            for obj in objects:
                class_name = obj.find('name').text.lower().strip()
                bbox = obj.find('bndbox')
                # VOC dataset format follows Matlab, in which indexes start from 0
                x1 = float(bbox.find('xmin').text) - 1
                y1 = float(bbox.find('ymin').text) - 1
                x2 = float(bbox.find('xmax').text) - 1
                y2 = float(bbox.find('ymax').text) - 1
                
                boxes.append([x1/self.imgW,y1/self.imgH,x2/self.imgW,y2/self.imgH])
                labels.append(self.class_names.index(class_name))
            return (torch.tensor(boxes, dtype=torch.float),
                    torch.tensor(labels, dtype=torch.long))
    if __name__ == '__main__':
        datset = Mydataset()
        import cv2 as cv
        img,gt_loc,gt_labels = datset[0]
        cv_img = np.array(img)
        cv_img = cv.cvtColor(cv_img,cv.COLOR_RGB2BGR)
        idx = gt_labels > 0
        #print(gt_loc.size(),dataset.priors.size())
        loc = convert_locations_to_boxes(gt_loc,datset.center_form_priors,0.1,0.2)
        loc = loc[idx]
        label = gt_labels[idx]
        for i in range(loc.size(0)):
            print(loc.size())
            x1,y1,w,h = loc[i,:]
            #print(x,y,r)
            x1 = x1.item() * 512.
            y1 = y1.item() * 512.
            w= w.item() * 512.
            h = h.item() * 512.        
            #cv.circle(cv_img,(int(x),int(y)),int(r),(255,0,0),2)
            cv.rectangle(cv_img,(int(x1 - w/2),int(y1-h/2)),(int(x1 + w/2),int(y1 + h/2)),(255,0,0),2)
        cv.imshow("cv",cv_img)
        cv.waitKey(0)
    
    

    训练

    '''
    @Descripttion: This is Aoru Xue's demo,which is only for reference
    @version: 
    @Author: Aoru Xue
    @Date: 2019-06-15 12:56:39
    @LastEditors: Aoru Xue
    @LastEditTime: 2019-09-10 20:46:54
    '''
    import torch
    import torchvision
    from TinySSD import TinySSD
    #from vgg_ssd import build_ssd_model
    from dataset import Mydataset
    from torchvision import transforms
    #from transforms import *
    from torch.utils.data import DataLoader
    from multibox_loss import MultiBoxLoss
    import torch.optim as optim
    from tqdm import tqdm
    def train(dataloader,net,loss_fn,optimizer,epochs = 200):
        for epoch in range(epochs):
            running_loss_bbox = 0.
            running_loss_class = 0.
            for img,gt_bbox,gt_class in tqdm(dataloader):
                img = img.cuda()
                gt_bbox = gt_bbox.cuda()
                gt_class = gt_class.cuda()
                optimizer.zero_grad()
                pred_class,pred_locations = net(img)
                """Compute classification loss and smooth l1 loss.
    
                    Args:
                        confidence (batch_size, num_priors, num_classes): class predictions.
                        predicted_locations (batch_size, num_priors, 4): predicted locations.
                        labels (batch_size, num_priors): real labels of all the priors.
                        gt_locations (batch_size, num_priors, 4): real boxes corresponding all the priors.
                """
                regression_loss, classification_loss = loss_fn(pred_class ,pred_locations,gt_class,gt_bbox)
                loss = regression_loss + classification_loss
                loss.backward()
                running_loss_bbox += regression_loss.item()
                running_loss_class += classification_loss.item()
                optimizer.step()
                #print(pred_bbox.size(),pred_class.size())
                
                #print("epoch: {},bbox loss:{:.8f} , class loss:{:.8f}".format(epoch + 1,loss[0].cpu().item(),loss[1].cpu().item()))
            print("*" * 20)
            print("average bbox loss: {:.8f}; average class loss: {:.8f}".format(running_loss_bbox/len(dataloader),running_loss_class/len(dataloader)))
            if epoch % 5 == 0:
                torch.save(net.state_dict(),"./ckpt/{}.pkl".format(epoch))
    if __name__ == "__main__":
        net = TinySSD()
        net.cuda()
        loss_fn = MultiBoxLoss(3.)
        transform = transforms.Compose([
            transforms.Resize((512,512)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]
        )
        # transform = Compose([
        #     ConvertFromInts(),
        #     PhotometricDistort(),
        #     Expand([123, 117, 104]),
        #     RandomSampleCrop(),
        #     RandomMirror(),
        #     ToPercentCoords(),
        #     Resize(300),
        #     SubtractMeans([123, 117, 104]),
        #     ToTensor(),
        # ])
    
        optm = optim.Adam(net.parameters(),lr = 1e-3)
        dtset = Mydataset(img_path = "./dataset",transform = transform)
        dataloader = DataLoader(dtset,batch_size = 8,shuffle = True)
        
        train(dataloader,net,loss_fn,optm)
    
  • 相关阅读:
    linux 信号处理 二 (信号的默认处理)
    linux 信号处理 一 (基本概念)
    POSIX 消息队列 之 参数说明
    System V 消息队列 实例
    KDB支持单步调试功能(ARM架构)
    找工作笔试面试那些事儿(16)---linux相关知识点(1)
    Central Europe Regional Contest 2012 Problem H: Darts
    计算机数据结构之——什么是艺术品?
    老罗android开发视频教程 下载地址
    HTML5 实现拖拽
  • 原文地址:https://www.cnblogs.com/aoru45/p/11027529.html
Copyright © 2011-2022 走看看