zoukankan      html  css  js  c++  java
  • Human-like Controllable Image Captioning with Verb-specific Semantic Roles(具有动词语义角色的类人可控图像字幕生成)

     前人的缺陷:

    CIC works mainly focus on (1)subjective control signals,(2)objective control signals  or (1) Content-controlled (2) Structure controlled

    almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal:

      1) Event-compatible:all visual contents referred to in a single sentence should be compatible with the describe activity.

      2) Sample-suitable: the control signals should be suitable for a specific image sample.

    论文的创新点:

    propose a new event-oriented objective control signal, Verb-specific Semantic Roles (VSR), to meet both event-compatible and sample-suitable requirements simultaneously。

    VSR consists of a verb and some user-interested semantic roles。

    Grounded Semantic Role Labeling: visual features of all grounded proposal sets。

    Semantic Structure Plannerhierarchical semantic structure learning model, which aims to learn a reasonable sequence of sub-roles S。

    Verb-specific Semantic RolesGrounded Semantic Role Labeling  υ  Semantic Structure Planner

     



     step:we first use GSRL and SSP to obtain semantic structures and grounded regions features: (Sa; Ra) and (Sb; Rb).

    Then,as shown in Figure above, we merge them by two steps。

      (a) find the sub-roles in both Sa and Sb which refer to the same visual regions 

      (b) insert all other sub-roles between the nearest two selected sub-roles


    模型架构:

    Faster R-CNN(ResNet-101) + Controllable LSTM + Controllable UpDn + SCT

    原文: https://arxiv.org/abs/2103.12204

  • 相关阅读:
    打印java 对象信息的小技巧
    git 忽略已经跟踪文件的改动
    mysql主从备份方案
    Lucene4.3和Lucene3.5性能对比(二)
    Lucene4.3和Lucene3.5性能对比(一)
    Cracking the coding interview--Q1.1
    CRACKING THE CODING INTERVIEW 笔记(1)
    关于名称重整(name mangling)、多态性的一些简单介绍
    shell中sed用法
    GDB调试GCC(jRate)
  • 原文地址:https://www.cnblogs.com/sfnz/p/14635500.html
Copyright © 2011-2022 走看看