Background
1. Weakly supervised semantic segmentation and localization have a problem of focus only on the most important parts of an image since they use only image-level annotations.
2. The complete extent of objects are good for object localization.
3. A novel method is proposed to cover the entire parts of objects.
Main points
ff
To cover the entire parts of objects, a simple way is to find the most discriminative part, the second most discriminative part and so on. When locating the second most discriminative part, we mask out the most discriminative one. This is the core idea of this paper.
1. Find the most discriminative parts of an image as usual, which is done by the first network.
2. Mask out the most discriminative parts of an image.
3. Train the same network to find the second most discriminatvie parts.
Drawbacks
1. The network is not shared during two-phase learning.
2. The network can not be trained in an end-to-end manner.