使用Multi-head Self-Attention进行自动特征学习的CTR模型
https://blog.csdn.net/u012151283/article/details/85310370
nlp中的Attention注意力机制+Transformer详解
https://zhuanlan.zhihu.com/p/53682800
Self-Attention与Transformer
https://zhuanlan.zhihu.com/p/47282410
https://jalammar.github.io/illustrated-transformer/
Attention原理和源码解析
https://zhuanlan.zhihu.com/p/43493999