20181116论文总结

zoukankan html css js c++ java

20181116论文总结

折腾了4days，把foil it find mismathes论文看的差不多了吧，很自大的自己竟然只花一天的时间就给老师讲这篇论文去，然后被老师批回来重新读，哈哈哈哈想想就搞笑

这篇论文的目的，就是利用自己扩大数据集（MS-COCO）形成的新数据集，用数据集来测试视觉-语言模型（给模型一张图片，输出对这篇图片的描述）期间利用3个task来检测这些模型。

一、数据集准备

1.Generation of replacement word pairs

其目的是为了 to replace one noun in the original caption (the target) with an incorrect but similar word (the foil).

image包含91个常见的类（dog, elephant, bicycle...）11个supercategories（Animal, Vehicle...），论文中使用73个常见的类,去掉了多词表达类（traffic light）

在original caption中用一个不正确但相似的词（the foil）替换一个名词（the target）。名词和foil来自于同一个supercategories （bicycle：motorcycles）、（bicycle：car）、（bird：dog）...

共得到472个（target：foil）pairs

2.Splitting of replacement pairs into training and testing

其目的为了避免模型因替换频率而学习无关紧要的相关性

obtain 256 pairs, built out of 72 target and 70 foil words, for the training set

216 pairs, containing 73 target and 71 foil words, for the test set

3.Generation of foil captions(产生错误的字幕)

Replace only those target words that occur in more than one MS-COCO caption associated with that image

(仅替换与该图像相关的多个MS-COCO标题中出现的目标词)

Only replace a word with foils that are not among the labels (objects) annotated in MS-COCO for that image

(只使用不在MS-COCO中标注的图片标签(对象)中的foils替换单词)

4.Mining the hardest foil caption foreach image(为每个图片找出最难的错误的caption--其目的也就是找出和图片几乎描述差不多的)

在第3step中，每张图片已经产生很多个 foil captions，为每个图片找出最难的那个

做法：是使用一个 the state-of-the -art model （N）来产生caption,model需要训练，

loss(caption,N(I))--->caption 是所有的foil caption，I是image，通过损失函数来计算 foil caption 和 N(I) ，损失函数值越小，说明foil caption越容易和 caption gerenated by model混淆

会使用一个归一化，p = 1-l(c,N(I)),p越大，越难。计算loss的，foil论文引用的《DeepVisual-SemanticAlignmentsforGeneratingImageDescriptions》论文里面的公式特别像SVM形式，它里面计算的是图片中region和word对齐的score。用region的向量与word的向量进行点积计算。

二：进行3个task

Task 1 (T1): Correct vs. foil classification

The IC models, choose the multimodel bi-directional LSTM (Bi-LSTM) (predict a word in a senctence by considering both the past and future context ) Task 1 (T1): Correct vs. foil classification Given a test image I and a test caption(w1,...,wt−1, wt,wt+1,...,wn) input I 生成caption (V1,...Vt-1,Vt,Vt+1,...,Vn)）对test caption中的每个单词wt 用模型生成的Vt替换 t = (w1,...,wt−1, Vt, wt+1,...,wn) test caption和所有生成的caption比较当生成的标题的所有条件概率低于分配给测试标题的条件概率时，后者被分类为good，否则作为foil。其条件概率可以用之前的计算概率的公式计算。

关于 wt 和 vt中的，使用引用的论文里面的模型可以根据你输入的图片和test caption预测vt，它的模型根据图片和（w1,w2......wt-1) 或(wt+1,....)来预测vt，就是论文里面的 we remove the word and use the model to generate new captionsin which the wt has been replaced by the word vt predicted by the model。

Task 2 (T2): Foil word detection

计算每个生成的caption的条件概率。条件概率最高的则为 foil word

Task 3 (T3): Foil word correction

将线性回归方法应用于所有的目标词，并选择目标词，该目标词有最高的概率使错误的标题对给定的图像是正确的

。。。。

查看全文

相关阅读:
信息爆炸时代，对待信息的三种方式
 Spring事务管理
 归并排序和快速排序的衍生问题
 Linux之Shell命令
 程序员找工作的干货经验
 css3 Transition动画执行时有可能会出现闪烁的bug
布尔值
 null, undefined理解
 js文字的无缝滚动（上下）
vue实现文字上下滚动

原文地址：https://www.cnblogs.com/Shaylin/p/9971666.html