[CVPR2017] Visual Translation Embedding Network for Visual Relation Detection 论文笔记

zoukankan html css js c++ java

[CVPR2017] Visual Translation Embedding Network for Visual Relation Detection 论文笔记
http://www.ee.columbia.edu/ln/dvmm/publications/17/zhang2017visual.pdf

Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang† , Zawlin Kyaw‡ , Shih-Fu Chang† , Tat-Seng Chua‡ †Columbia University, ‡National University of Singapore

亮点
- 视觉关系预测问题的分析与化简：把一种视觉关系理解为在特征空间从主语到宾语的一种变换，很有效、很直白
- 实验设计的很棒，从多角度进行了分析对比：语言空间划分，多任务对物体检测的提升，零次学习等。
现有工作
- Mature visual detection [16, 35]
- Burgeoning visual captioning and question answering [2, 4]
- Visual Relation Detection: a visual relation as a subject-predicate-object triplet
主要思想

Translation Embedding 视觉关系预测的难点主要是：对于N个物体和R种谓语，有N^2R种关系，是一个组合爆炸问题。解决这个问题常用的办法是：
- 估计谓语，不估计关系，缺点：对于不同的主语、宾语，图像视觉差异巨大
受Translation Embedding (TransE) 启发，文章中将视觉关系看作在特征空间上从主语到宾语的一种映射，在低维空间上关系元组可看作向量变换，例如person+ride ≈ bike.

Knowledge Transfer in Relation 物体的识别和谓语的识别是互惠的。通过使用类别名、位置、视觉特征三种特征和端对端训练网络，使物体和谓语之前的隐含关系在网络中能够学习到。

算法

Visual Translation Embedding

Loss function

Feature Extraction Layer

classname + location + visual feature 不同的特征对不同的谓语（动词、介词、空间位置、对比）都有不一样的作用

Bilinear Interpolation

In order to achieve object-relation knowledge transfer, the relation error should be back-propagated to the object detection network and thus refines the objects. We replace the RoI pooling layer with bilinear interpolation [18]. It is a smooth function of two inputs:

结果

Translation embeding: ＋18%

object detection ＋0.6% ～ 0.3%

State-of-art:
- Phrase Det. +3% ~ 6%
- Relation Det. +1%
- Retrieval -1% ~ 2%
- Zero-shot phrase detection
- Phrase Det. －0.7% (without language prior)
- Relation Det. －1.4%
- Retrieval ＋0.2％
问题
- 两个物体之间可能有多种关系，比如person ride elephant，同时也存在person short elephant但是文章中的方法无法表示多样化的关系
- 没有使用语言先验知识，使用多模态信息可能会有所提升
查看全文

相关阅读:
Spring中内置的一些工具类
 那些年遇到过的面试题
 ThreadPoolExecutor策略配置以及应用场景
 程序员如何谈加薪?
日常工作中该注意哪些点?
aarch64的系统上执行armhf程序
 挂载镜像
 pycharm折叠代码
 暴力破解分类
 weblogic常见漏洞

原文地址：https://www.cnblogs.com/Xiaoyan-Li/p/8555235.html