hard example mining(困难样本挖掘)

zoukankan html css js c++ java

hard example mining(困难样本挖掘)
Hard example mining

核心思想：用分类器对样本进行分类，把其中错误分类的样本(hard negative)放入负样本集合再继续训练分类器。

why hard negative?

FP: false positive, 错误的将其分类成正例。
我的理解是label 相对特征明显一些，如果把label分成负例说明这个分类器模型不够好。主要的问题还是庞大的negative里面有些令分类器难以分辨的物体。　　
关键:　找到这些影响分类器性能的hard negetive.

how hard negative?

数据集
- 对于目标检测中我们会事先标记处ground truth，然后再算法中会生成一系列proposals，proposals与ground truth的IOU超过一定阈值（通常0.5）的则认定为是正样本，低于一定阈值的则是负样本。然后扔进网络中训练。However，这也许会出现一个问题那就是正样本的数量远远小于负样本，这样训练出来的分类器的效果总是有限的，会出现许多false positive。把其中得分较高的这些false positive当做所谓的Hard negative，既然mining出了这些Hard negative，就把这些扔进网络再训练一次，从而加强分类器判别假阳性的能力。
- 可以自己做出这样的数据集用于以后的训练或是测试
loss上选取
- 对于上面那种离线的方法也可以采用online的方案，训练的时候选择hard negative来进行迭代,从而提高训练的效果。
- 制定规则去选取hard negative: DenseBox
  In the forward propagation phase, we sort the loss of output pixels in decending order, and assign the top 1% to be hard-negative. In all experiments, we keep all positive labeled pixels(samples) and the ratio of positive and negative to be 1:1. Among all negative samples, half of them are sampled from hard-negative samples, and the remaining half are selected randomly from non-hard negative.
  核心思想：选取与label差别大(loss大)的作为hard negtive
- 根据制定的规则选取了hard negative ,在训练的时候加强对hard negative的训练。
rois上选取
- 选取rois: OHEM
- 在绿色部分的（a）中，一个只读的RoI网络对特征图和所有RoI进行前向传播，然后Hard RoI module利用这些RoI的loss选择B个样本。在红色部分（b）中，这些选择出的样本（hard examples）进入RoI网络，进一步进行前向和后向传播。同样是利用loss选择，但是针对的是two stage的方案，选取的是第一阶段的rois。
总结

选取loss较大(检测结果与label差异较大)的部分进行训练。
查看全文

相关阅读:
Selenium2Library+ride学习笔记
 windbg 调试技巧
 LINUX常用命令--重定向、管道篇（四）
Linux文件系统与结构
 windbg命令学习4
windbg命令学习3
windbg命令学习2
MySQL常用操作命令
 Httpwatch 工具介绍
 windows平台上用python 远程线程注入,执行shellcode

原文地址：https://www.cnblogs.com/o-v-o/p/9975366.html

hard example mining(困难样本挖掘)

Hard example mining

why hard negative?

how hard negative?

数据集

loss上选取

rois上选取

总结