VALSE2019总结(2)-以人为中心的视觉理解

zoukankan html css js c++ java

VALSE2019总结(2)-以人为中心的视觉理解
2. 以人为中心的视觉理解 (ceiwu lu, SJU)

2.1 基于视频的时序建模和动作识别方法 (liming wang, NJU)
1. dataset
  
  两张图：
  
  注意一个区分：trimmed and untrimmed videos
2. outline
  
  action recognition
  
  action temporal localization
  
  action spatial detection
  
  action spatial-temporal detection
3. opportunities and challenges
  
  opportunities ：
  
  videos provide huge and rich data for visual learning
  
  action is core in motion perception and has many applications in video understanding
  
  challenges：
  
  complex dynamics and temporal variations
  
  action vocabulary is not well defined
  
  noisy and weakly labels (dense labeling is expensive)
  
  High computational and memory cost
4. temporal structure: 需要对动作进行分解：decomposition
5. 常用的 Deep networks
  
  large-scale video classification with CNN (feifei li, CVPR2014)
  
  Two-Stream CNN for action recognition in videos (NIPS2014)
  
  learning spatiotemporal features with 3D CNN (Du Tran, ICCV2015)
  
  TDD (liming wang, CVPR2015)
  
  Real-time action recognition with enhanced motion vector CNNs (CVPR2016)
  
  Two Stream I3D (CVPR2017)
  
  R(2+1)D (CVPR2018)
  
  SlowFast Networks (kaiming he, CVPR2019)
6. liming wang 自己的3篇工作
  
  shot-term -> middle-term -> long term modeling，对应的论文是 (ARTNet -> TSN -> UntrimmedNet)
  
  更多细节理解，直接看他的PPT写读后感
  
  按照 liming wang 自己说的，video action recognition/detection 对于我的VAD 基本没有帮助
7. 之前看到一些很好的zhihu link: 动作识别-1, 动作识别-2, 时序行为检测-1, 时序行为检测-2, 时序行为检测-3, 时序行为检测-4,
8. 所有的PPT图片
2.2 复杂视频的深度高效分析与理解方法 (yu qiao, CAS)
1. DL的一些经验性Trick介绍
2. 人脸识别的开集特点 (Open-set 和 novelty detection有点类似，参考TODO)
  
  Center Loss (ECCV2016)
  
  center loss意思即为：为每一个类别提供一个类别中心，最小化min-batch中每个样本与对应类别中心的距离，这样就可以达到缩小类内距离的目的。
  
  center loss的原理主要是在softmax loss的基础上，通过对训练集的每个类别在特征空间分别维护一个类中心，在训练过程，增加样本经过网络映射后在特征空间与类中心的距离约束，从而兼顾了类内聚合与类间分离。
  
  From：link-1, link-2
  
  Center Loss的改进 (IJCV2019): 用投影方向代替类中心
  
  Large Margin思想设计的Loss:
  
  L-softmax (Liu, ICML 2016)
  
  A-softmax (Liu, CVPR2017)
  
  Additive Margin Softmax (ICLR 2018 workshop)
  
  CosLoss (wang, CVPR2018)
  
  ArcFace (CVPR2019)
3. Range Loss ：有效应对类间样本数不均衡造成的长尾问题
  
  motivation：少数人(明星)具有大量的图片，多数人却只有少量图片，这种长尾分布启发了两个动机：(1)长尾分布如何影响模型性能的理论分析；(2)设计新的Loss解决这个问题
  
  此处有一张图片，并且Range Loss 的 PPT缺失了
4. video action recognition
  
  姿态注意力机制 RPAN (ICCV2017, Oral)
  
  把行为识别和姿态估计两个任务进行结合
  
  利用姿态变化，引导 RNN 对行为的动态过程进行建模
5. 一篇文章
  
  Temporal Hallucinating for Action Recognition with Few Still Images (CVPR 2018)
6. 一些图
2.3 understanding emotions in videos (yanwei fu, FDU)
- 个人感觉：这是个刚挖的新坑，有趣，值得了解下
1. applicaltion
  
  web video search
  
  video recommendation system
  
  avoid inappropriate advertisement
2. Tasks of Emotions in videos
  
  Emotion recognition
  
  emotion attribution
  
  emotion-oriented summarization
3. Challenges
  
  Sparsely expressed in videos
  
  Diverse content and variable quality
4. Knowledge Transfer
  
  Zero-shot Emotion learning (配一张图)
  
  A multi-task neural approach for emotion attribution, classification and summarization (TMM)
  
  Frame-Transfermer emotion classification Network (*CMR 2017)
5. Emotion-oriented summarization
  
  相当于选择关键帧以及帧信息融合
6. Face emotion
  
  Posture, Expression, Identity in faces
7. 一些图：
2.4 以人为中心视觉识别和定位中的结构化深度学习方法探索 (wanli ouyang, sdney university)
1. outline
  
  introduction
  
  structured feature learning
  
  back-bone model design
  
  conclusion
2. introduction
  
  object detection
  
  human pose estimation
  
  action recognition
3. structured feature learning
  
  structure in neurons
  
  motivation：传统 neurons 在同一层没有连接，在相邻层存在局部或者全部连接，没有保证局部区域的信息。从而引出每一层网络的各神经元具有结构化信息的。然后以人体姿态估计为例，分析了基于全连接神经网络的问题：在对人体节点的距离进行建模需要大的卷积核以及一些关节点的关系是不稳定。提出结构化特征学习的人体姿态估计模型（Bidirectional Tree）。
  
  Bidirectional Tree
  
  对应的papers
  
  end-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation (CVPR2016)
  
  structure feature learning for pose estimation (CVPR2016)
  
  CRF-CNN, modeling structured information in human pose estimation (CVPR2016)
  
  learning deep structured multi-scale features using attention-gated CRFs for contour prediction (NIPS 2017)
  
  application of structured feature learning
  
  有一张图片
4. back-bone model design
  
  Hourglass for classification (Encode-Decoder 结构，比如 UNet，一般用于图像分割，不用于分类)
  
  希望: feature with high-level semantics and high resolution is good
  
  现实：feature with high-level semantics with low resolution
  
  Hourglass for classification has poor performance的原因分析：Different tasks require different resolution of features，所以提出 FishNet
  
  FishNet
  
  motivation: 为了统一利用像素级、区域级以及图像级任务的优势，欧阳万里老师提出了FishNet，FishNet的优势是：更好的将梯度传到浅层网络，所提取的特征包含了丰富的低层和高层语义信息并保留和微调了各层级信息。
  
  pros.
  
  better gradient flow to shallow layers
  
  features:
  
  contain rich low-level and high-level semantics
  
  are preserved and refined from each other (信息互相交流)
  
  code: https://github.com/kevin-ssy/FishNet
5. conclusion
  
  structured deep learning is (1)effective (2)from observation
  
  end-to-end joint training bridges the gap between structure modeling and feature learning
6. 一些图
2.5 面向监控视频的行为识别与理解 (xiaowei lin, SJU)
1. 行为识别领域的task
  
  基于轨迹的行为分析
  
  面向任意视频的行为识别 (liming wang)
  
  面向监控视频的行为识别
2. 目标检测的几个点
  
  Tiny DSOD (BMVC 2018)
  
  Toward accurate one-stage object detection with AP-Loss (CVPR 2019)
  
  kill two birds with one stone: boosting both object detection accuracy and speed with adaptive patch-of-interest composition (2017)
3. 若干应用
  
  三维目标检测与姿态估计
  
  多目标跟踪
  
  基于目标检测、跟踪的实时场景统计分析
  
  多相机跟踪
  
  (correspondence structure Re-ID)
  
  learning correspondence structure for person re-identification (TIP2017)
  
  person re-identification with correspondence sturcture learning (ICCV 2015)
  
  (Group Re-ID)
  
  Group re-identification: leveraging and integrating multi-grain information, (MM2018)
  
  车载跨相机定位
  
  无人超市
  
  野生东北虎 Re-ID
4. 行为识别
  
  多尺度特征
  
  action recognition with coarse-to-fine deep feature interation and asynchronous fusion (AAAI 2018)
  
  时空异步guanlian
  
  cross-stream selective networks for action recognition (CVPR workshop 2019)
  
  时空行为定位
  
  finding action tubes with an integrated sparse-to-dense framework (arxiv 2019)
  
  监控行为识别
  
  有一张图
5. 其他应用
  
  实时行为/事件检测
  
  基于轨迹的行为分析-个体行为分析
  
  a tube-and-droplet-based apporach for representing and analyzing motion trajectories (TPAMI 2017)，非DL，无好感
  
  基于轨迹的行为分析-轨迹聚类与挖掘
  
  unsupervised trajectory clustering via adaptive multi-kernel-based shrinkage (ICCV 2015)，比较老。。。但可以以它为base，看最新的引用它的高质量论文即可
  
  密集场景行为分析
  
  a diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes (TIP 2016)
  
  finding coherent motions and semantic regions in crowd scenes: a diffusion and clustering apporach (ECCV 2014)
6. 主页：link-1
查看全文

相关阅读:
实体类、边界类和控制类
 面向对象分析和面向对象设计的区别
 面向对象分析与设计的步骤
 用例图：从用户角度描述系统功能，并指各功能的操作者
 面向对象分析和设计（OOA/D）
在UML系统开发中有三个主要的模型
 UML建模之活动图介绍（Activity Diagram）
活动图本质上就是流程图
 流程图
 流程、业务与事务

原文地址：https://www.cnblogs.com/LS1314/p/10885080.html

VALSE2019总结(2)-以人为中心的视觉理解

2. 以人为中心的视觉理解 (ceiwu lu, SJU)

2.1 基于视频的时序建模和动作识别方法 (liming wang, NJU)

2.2 复杂视频的深度高效分析与理解方法 (yu qiao, CAS)

2.3 understanding emotions in videos (yanwei fu, FDU)

2.4 以人为中心视觉识别和定位中的结构化深度学习方法探索 (wanli ouyang, sdney university)

2.5 面向监控视频的行为识别与理解 (xiaowei lin, SJU)