1. Parameter pruning and sharing
1.1 Quantization and Binarization
-
Compressing deep convolutional networks using vector quantization
-
Binaryconnect: Training deep neural networks with binary weights during propagations
-
Binarynet: Training deep neural net- works with weights and activations constrained to +1 or -1
-
Xnor-net: Imagenet classification using binary convolutional neural networks
-
Deep neural networks are robust to weight binarization and other non- linear distortions
1.2 Pruning and Sharing
-
Comparing biases for minimal network construction with back-propagation
-
Second order derivatives for network pruning: Optimal brain surgeon
-
Learning both weights and connections for efficient neural networks
1.3 Designing Structural Matrix
2. Low rank factorization and sparsity
-
Exploiting linear structure within convolutional networks for efficient evaluation
-
Speeding up convolutional neural networks with low rank expansions
-
Speeding-up convolutional neural networks using fine-tuned cp- decomposition
-
Low-rank matrix factorization for deep neural network training with high-dimensional output targets
3. Transferred/compact convolution filters
-
Understanding and improving convolutional neural networks via concatenated rectified linear units
-
Inception-v4, inception-resnet and the impact of residual connections on learning
-
SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MB MODEL SIZE
-
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
4. Knowledge distillation
5. Other
-
Outrageously large neural networks: The sparsely- gated mixture-of-experts layer
-
Deep dynamic neural networks for multimodal gesture segmentation and recognition
-
Deep pyramidal residual networks with separated stochastic depth