zoukankan      html  css  js  c++  java
  • 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction

    Note here: it's a learning note on unsupervised learning model from Prof. Gupta's group.

    Link: http://120.52.73.9/www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf

    Motivation:

    - Similar to most motivations of unsupervised learning method, cut it out here.

    Proposed Model:

    - Given one central patch of the object, and another one arounding it, the model must guess the relative spatial configuration between these two patches.

    - Intuition: when human doing this assignment, we get higher accuracy once we recognize what object it is and what it’s like with a whole look. That is to say, a model plays well on this game would have percepted the features of each object.

    (i.e. we can get right answer for the following quizz once we recognize what objects they are.)

     

     

    So the unsupervised representation learning can also be formulated as learning an embedding where images that are semantically similar close, while semantically different ones are far apart.

    - Pipline:

    • Feed two patches into a parallel convolutional network which share parameters.
    • Fuse the feature vector of each patch and pass through stacked fully connected layers.
    • Come out with an eight-dimension vector that predicts relative spatial configuration between the two patches.
    • Compute loss, gradients and back propagate through this network to update weights.

     

    Aoiding “trivial” solutions:

    We need to preprocess images to avoid the model learns some trivial features, like:

    - Low-level cues like boundary patterns or textures continuing between patches, which could potentially serve as a shortcut.

    - Chromatic aberration: it arises from differences in the way the lens focuses light and different wavelengths. In some cameras, one color channel (commonly green) is shrunk toward the image center relative to the others. Once the network learns the absolute location on the lens, solving the relatve location task becomes trivial. 

  • 相关阅读:
    yum 源配置
    RHCE学习笔记 管理1 (第六章 第七章)
    阿里云ecs(phpstudy一件包)
    PHP第三方登录 -- 微博登录
    php 实现qq第三方登录
    Linux 搭建svn服务器
    Linux vi编辑器的基本命令
    Mysql 导出导入
    svn服务配置和日常维护命令
    Eclipse导入idea 项目
  • 原文地址:https://www.cnblogs.com/kanelim/p/5303983.html
Copyright © 2011-2022 走看看