Note of Compression of Neural Machine Translation Models via Pruning

zoukankan html css js c++ java

Note of Compression of Neural Machine Translation Models via Pruning
The problems of NMT Model
1. Over-Parameterization
2. Long running time
3. Overfitting
4. Big Storage size
The redundancies of NMT Model

Most important: Higher Layers; Attention and Softmax Weights

redundancy: lower layers; embedding weights;

Traditional Solutions

Optimal Brain Damage (OBD) and Optimal Brain Surgeon(OBS)

Recent Ways

Magnitude based pruning with iterative retraining（基于幅度的剪枝与反复的重复训练）yielded strong results for Convolutional Neural Networks (CNN) performing visual tasks.

sparsity-inducing regularizers or ‘wiring together’ pairs of neurons with similar input weights

These approaches are much more constrained than weight-pruning schemes; they necessitate finding entire zero rows of weight matrices, or near-identical pairs of rows, in order to prune a single neuron.

weight-pruning approaches

weight-pruning approaches allow weights to be pruned freely and independently of each other

many other compression techniques for neural networks
1. approaches based on on low-rank approximations for weight matrices;
2. weight sharing via hash functions;
Understanding NMT Weights

Weight Subgroups in LSTM

details of LSTM:

[left(egin{array}{c} {i} \ {f} \ {o} \ {hat{h}} end{array} ight)=left(egin{array}{c} {operatorname{sig} m} \ {operatorname{sig} m} \ {operatorname{sig} m} \ { anh } end{array} ight) T_{4 n, 2 n}left(egin{array}{c} {h_{t}^{l-1}} \ {h_{t-1}^{l}} end{array} ight) ]
we get (left(h_{t}^{l}, c_{t}^{l} ight)) from the inputs of LSTM $left(h_{t-1}^{l}, c_{t-1}^{l} ight) $

[egin{array}{l} {c_{t}^{l}=f circ c_{t-1}^{l}+i circ hat{h}} \ {h_{t}^{l}=o circ anh left(c_{t}^{l} ight)} end{array} ]
(T_{4 n, 2 n}) is a matrix that is responsible for the parameters.

Pruning Schemes

Suppose we wish to prune x% of the total parameters in the model. How do we distribute the pruning over the different weight classes
1. Class-blind： Take all parameters, sort them by magnitude and prune the (x \%) with smallest magnitude, regardless of weight class.
2. Class-uniform： Within each class, sort the weights by magnitude and prune the (x \%) with smallest magnitude.
With class-uniform pruning, the overall performance loss is caused disproportionately by a few classes: target layer 4, attention and softmax weights; it seems that higher layers are more important than lower layers, and that attention and softmax weights are crucial
查看全文

相关阅读:
一周的前端面试
 PHP导出超大的CSV格式的Excel表方案
 Java HashMap Demo
Vmware 设置桥接模式
 Vue 模板
 SpringMVC 拦截器
 IntelliJ IDEA 修改缓存文件设置
 Maven 命令操作项目
 Maven 介绍
 Spring Boot 5 SpringSecurity身份验证

原文地址：https://www.cnblogs.com/wevolf/p/12105538.html

Note of Compression of Neural Machine Translation Models via Pruning

The problems of NMT Model

Traditional Solutions

Recent Ways

many other compression techniques for neural networks

Understanding NMT Weights

Weight Subgroups in LSTM

Pruning Schemes