zoukankan      html  css  js  c++  java
  • [Arxiv1706] Few-Example Object Detection with Model Communication 论文笔记

    https://arxiv.org/pdf/1706.08249.pdf

    Few-Example Object Detection with Model Communication,Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng

     

    亮点

    • 本文仅仅通过每个类别3-4个bounding box标注即可实现物体检测,并与其它使用大量training examples的方法性能可比
    • 主要方法是:multi-modal learning (多模型同时训练) + self-paced learning (curriculum learning) 

    相关工作

    这里介绍几个比较容易混淆的概念,以及与他们相关的方法

    • 弱监督物体检测:数据集的标签是不可靠的,如(x,y),y对于x的标记是不可靠的。这里的不可靠可以是标记不正确,多种标记,标记不充分,局部标记等。
      • 标签是图像级别的类别标签[7][8][9][10][11][18][30][31][32][33][34]
    • 半监督物体检测:半监督学习使用大量的未标记数据,以及同时使用标记数据,来进行模式识别工作。
      • 一些训练样本只有类别标签,另外一些样本有详细的物体框和类别标注[4][5][6]
        • 需要大量标注 (e.g., 50% of the full annotations)
      • 每个类别只有几个物体框标注(Few-Example Object Detection with Model Communication)[12][35]
        • 和few-shot learning 的区别:是否使用未标注数据学习
      • 通过视频挖掘位置标注,此类方法主要针对会移动的物体[2][3][29][1]
    • Webly supervised learning for object detection: reduce the annotation cost by leveraging web data

    方法

     

    Basic detector: Faster RCNN & RFCN

    Object proposal method: selective search & edge boxes

    Annotations: when we randomly annotate approximately four images for each class, an image may contain several objects, and we annotate all the object bounding boxes.

     

    参数更新
    更新vj:对上述损失函数进行求导,可以得到vj的解

    对同一张图像i同一个模型j,如果有多个样本使得vj=1,则只选择使Lc最小的那个样本置为1,其他置为0。gamma促使模型之间共享信息,因为vj为1时,阈值变大,图像更容易被选择到。

    更新wj:与其它文章方法相同

    更新yuj:为更新yuj我们需要从一组bounding box找到满足以下条件的解,

    很难直接找到最优化的解。文中采用的方案是:将所有模型预测出的结果输入nms,并通过阈值只保留分数高的结果,余下的组成yuj。

    去除难例:we employ a modified NMS (intersection/max(area1,area2)) to filter out the nested boxes, which usually occurs when there are multiple overlapping objects. If there are too many boxes (≥ 4) for one specific class or too many classes (≥ 4) in the image, this image will be removed. Images in which no reliable pseudo objects are found are filtered out.

    实验

    Compared with the-state-of-the-art (4.2 images per class is annotated)

    • VOC 2007: -1.1mAP, correct localization +0.9% compared with [21]
    • VOC 2012: -2.5mAP compared with [21], correct localization +9.8%
    • ILSVRC 2013: -2.4mAP compared with [21]
    • COCO 2014: +1.3 mAP compared with [22]

    [20] V. Kantorov, M. Oquab, M. Cho, and I. Laptev, “Contextlocnet: Context-aware deep network models for weakly supervised localization,” in European Conference on Computer Vision, 2016.
    [21] A. Diba, V. Sharma, A. Pazandeh, H. Pirsiavash, and L. Van Gool, “Weakly supervised cascaded convolutional networks,” 2017
    [22] Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Soft proposal networks for weakly supervised object localization,” in International Conference on Computer Vision, 2017.

    Ablation study

    • VOC 2007: +4.1 mAP compared with model ensemble
    • k number of labeled images per class; w/ image labels: image-level supervision incorporated

      

    不足

    虽然localization有一定准确率,但是难例图片漏检比较多(也就是说few example classification效果不好)。

     

  • 相关阅读:
    公司开发者账号申请流程之开发者账号的开通
    公司开发者账号申请流程之邓白氏码的申请
    关于idlf无法输入中文的解决办法
    python运算符和表达式
    mac修改默认打开方式
    关于app夜间模式那点事
    关于报错:'sharedApplication' is unavailable: not available on iOS (App Extension)
    ios 代码截屏模糊问题解决办法
    MySQL命令行分号无法结束问题解决
    Oracle rownum的理解
  • 原文地址:https://www.cnblogs.com/Xiaoyan-Li/p/8604816.html
Copyright © 2011-2022 走看看