Github
(https://github.com/facebookresearch/fairseq, https://github.com/facebookresearch/pytext)
https://github.com/facebookresearch/XLM.git
Abstract
本文任务:训练大规模跨语言面向多种自然语言Transfer任务的预训练方法
方法:用超过2TB的CommonCrawl数据训练了Transformer-based masked model
效果:
- 比mBERT在很多benchmarks上更好
- +14.6% average accu-racy on XNLI
- +13% average F1 score on MLQA
- +2.4% F1 score on NER
- 在(训练)资源更少的语言上表现更好
- improving 15.7% in XNLI accuracy for Swahili over previous XLM models
- 11.4% for Urdu over previous XLM models.
- 对关键参数进行了详细分析
- positive transfer 和 capacity dilution之间的权衡
- high/ low resource languages
- XLM-R能够处理多种语言而比单语言模型更好