zoukankan      html  css  js  c++  java
  • 论文阅读:Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

    Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

    We propose an unsupervised method for reference res-
    olution in instructional videos, where the goal is to tem- porally link an entity (e.g., “dressing”) to the action (e.g., “mix yogurt”) that produced it. The key challenge is the inevitable visual-linguistic ambiguities arising from the changes in both visual appearance and referring expression of an entity in the video. This challenge is amplified by the fact that we aim to resolve references with no supervi- sion. We address these challenges by learning a joint visual- linguistic model, where linguistic cues can help resolve vi- sual ambiguities and vice versa. We verify our approach by learning our model unsupervisedly using more than two thousand unstructured cooking videos from YouTube, and show that our visual-linguistic model can substantially im- prove upon state-of-the-art linguistic only model on refer- ence resolution in instructional videos.

    我们在教学视频中提出了一种无监督的参考解析方法,其目的是将实体(例如“装扮”)与产生它的动作(例如“混合酸奶”)临时联系起来。 关键挑战是视频实体中视觉外观和参考表达的变化不可避免地导致视觉语言歧义。 我们旨在无监督地解决参考文献这一事实加剧了这一挑战。 我们通过学习联合的视觉语言模型来应对这些挑战,其中语言提示可以帮助解决视觉上的歧义,反之亦然。 我们通过使用YouTube上的两千多条非结构化烹饪视频无监督地学习了我们的模型,从而验证了我们的方法,并表明我们的视觉语言模型可以大大改善基于参考解析的最新语言唯一模型。 教学视频。

  • 相关阅读:
    python笔记2-python常见数据类型(一)
    python笔记1-环境安装和基本语法
    selenium自动化脚本错误总结
    Postman-Post请求示例
    用PHP删除ftp下载导致的文件空行
    JS实现鼠标悬浮,显示内容
    asp.net中处理程序调用HttpContext.Current.Session获取值出错
    自动化创建tornado项目
    fabric运维
    Python3虚拟环境安装:virtualenv、virtualenvwralpper
  • 原文地址:https://www.cnblogs.com/feifanrensheng/p/14020333.html
Copyright © 2011-2022 走看看