The order of solusions of each issue follows the timeline. You can see there is a trend from explicit to implicit, from hand-designed to automatical-learned, which can be followed to design your own innovative approaches!
- Hirerarchical part model for visibility estimation [1][5]
- Occlusion data or feature augmentation [2] see Data and Feature Augmentation
- Image pyramid: compute feature from each level of the image pyramid [1]
- computational expensive, usually applied during the inference stage
- Encoder-decoder: feature map from multiple convolutional and deconvolutional layers
- pyramidal feature hierarchy [5] [6] [7]
- feature pyramid [11]
- Deeper with atrous convolution
- Spatial pyramid pooling[14]
- 2,3,4 can be combined, and are explored in [15]
- Attemtion
- Detect and focus on a smaller region in each stage [2]
Data Imbalance
- Emphasize on balanced compilations of datasets in the first place
- Collecting their samples approximately uniformly
- Data and Feature Augmentation
- Dropout 1/2 neurals for better generalization [4]
- Generating hard feature maps for occlusion and deformation in object detection task [2]
- Conducting over-sampling of minority classes or under-sampling from the majority classes
- Weakness
- change the underlying data distributions and may result in suboptimal exploitation of available data
- increased computational effort and/or risk of over-fitting when repeatedly visiting the same samples
- SMOTE and derived variants on ways to avoid over-fitting
- Data Mining for Hard Examples
- Online hard example mining (OHEM) [3] for both intra-class data imbalance and positive-negative imbalance
- Cost-sensitive learning
- Focal loss [8]: greater loss for harder example
- Loss Max-Pooling for Semantic Image Segmentation [16]:by the maximization with respect to pixel weighting functions, the loss function providing an adaptive re-weighting of the contributions of each pixel. Pixels incurring higher losses during training are weighted more than pixels with a lower loss.
Local & global information combination
- Deep learning can learn some multi-scale information automatically[9]
- Top-down semantic from FPN (focus on each scale) [11]
- Multi-scale combination & selection from GBD-Net (focus on combination of scales) [12]
Utilization of Context
- In traidition machine learning, mainly used as refine object scores [1] [10]
- RNN [13]
Utilization of Object Part Information
- DPM [1]
- Deep learning with DPM [5]
- Position sensitive ROI pooling [10]: construct a score map from different channels (results of part detectors) of feature map
Metric learning
Mainly used in recognition.
Computation Efficiency
- eliminate redudant layers
- spatial adaptive computation
