zoukankan      html  css  js  c++  java
  • 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction

    Note here: it's a learning note on unsupervised learning model from Prof. Gupta's group.

    Link: http://120.52.73.9/www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf

    Motivation:

    - Similar to most motivations of unsupervised learning method, cut it out here.

    Proposed Model:

    - Given one central patch of the object, and another one arounding it, the model must guess the relative spatial configuration between these two patches.

    - Intuition: when human doing this assignment, we get higher accuracy once we recognize what object it is and what it’s like with a whole look. That is to say, a model plays well on this game would have percepted the features of each object.

    (i.e. we can get right answer for the following quizz once we recognize what objects they are.)

     

     

    So the unsupervised representation learning can also be formulated as learning an embedding where images that are semantically similar close, while semantically different ones are far apart.

    - Pipline:

    • Feed two patches into a parallel convolutional network which share parameters.
    • Fuse the feature vector of each patch and pass through stacked fully connected layers.
    • Come out with an eight-dimension vector that predicts relative spatial configuration between the two patches.
    • Compute loss, gradients and back propagate through this network to update weights.

     

    Aoiding “trivial” solutions:

    We need to preprocess images to avoid the model learns some trivial features, like:

    - Low-level cues like boundary patterns or textures continuing between patches, which could potentially serve as a shortcut.

    - Chromatic aberration: it arises from differences in the way the lens focuses light and different wavelengths. In some cameras, one color channel (commonly green) is shrunk toward the image center relative to the others. Once the network learns the absolute location on the lens, solving the relatve location task becomes trivial. 

  • 相关阅读:
    Sketch or Figma谁才是UI设计软件的未来?
    2020年网站首屏设计:最佳实践和例子
    为设计工具付费到底值不值得?
    社交网站原型模板分享
    摹客PS插件,支持自定切图尺寸!
    UI设计适合女生学吗?会不会很难?
    mysql数据操作之单表查询
    表操作
    库操作
    数据库
  • 原文地址:https://www.cnblogs.com/kanelim/p/5303983.html
Copyright © 2011-2022 走看看