1. Sketch me that shoe, Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Cheng Change Loy, in CVPR 2016.
A unique characteristic of sketches in the context of image retrieval is that they offer inherently fine-grained visual description - a sketch speaks for a 'hundred' words.
fine-grained sketch-based image retrieval (SBIR)面临三个挑战:
1) visual comparisons not only need to be fine-grained but also executed cross-domain; (sketch和photo是两个不同的domains)
2) free-hand (finger) sketches are highly abstract, making free-grained matching harder, and most importantly;
3) annotated cross-domain sketch-photo datasets required for training are scarce.
this paper introduces two instance-level SBIR datasets consisting of 1432 sketch-photo pairs in two categories (shoes and chairs), collected by asking participants to finger-sketch an object after observing a photo. Besides, a total of 32,000 ground-truth triplet ranking annotations are provided for both model development and performance evaluation.
this paper uses the annotated triplets as supervision to train triplet CNNs. The goal is to learn a feature mapping f that maps photos and sketches to a common feature embedding space, in which photos similar to pariticular sketches are closer than those dissimilar ones.
--> Triplet loss:
实验做得比较周密,考虑了4个步骤的pretrain和fine-tune阶段,并加入了data augmentation(包括stroke removal和stroke deformulation),每一种改进都获得了性能提升。