zoukankan      html  css  js  c++  java
  • Utterance-Wise Recurrent Dropout And Iterative Speaker Adaptation For Robust Monaural Speech Recognition

    单声道语音识别的逐句循环Dropout迭代说话人自适应

       

    WRBNwide residual BLSTM network,宽残差双向长短时记忆网络)

    [2] J. Heymann, L. Drude, and R. Haeb-Umbach, "Wide residual blstm network with discriminative speaker adaptation for robust speech recognition," submitted to the CHiME, vol. 4, 2016.

    reverberationn. [] 混响;反射;反响;回响

       

    CLDNNconvolutional, long short-term memory, fully connected deep neural networks,卷积-长短时记忆-全连接深度神经网络)

    [1] T.N. Sainath, O. Vinyals, A. Senior, and H. Sak, "Convolutional, long short-term memory, fully connected deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 45804584.

       

    speech separation,语音分离,将多说话人同时说话的语句分离为各个说话人独立说话的语句。

       

    LSTM训练中使用Dropout能有效缓解过拟合。

    [3] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.

       

    在输出门、遗忘门以及输入门使用基于语句采样丢帧Mask能取得最优结果(Cheng dropout)。

    [7] G. Cheng, V. Peddinti, D. Povey, V. Manohar, S. Khudanpur, and Y. Yan, "An exploration of dropout with lstms," in Proceedings of Interspeech, 2017.

       

    基于MLLR的迭代自适应方法,使用上一次迭代的解码结果来更新高斯参数。

    [10] P.C. Woodland, D. Pye, and M.J.F. Gales, "Iterative unsupervised adaptation using maximum likelihood linear regression," inSpokenLanguage, 1996.ICSLP96.Proceedings., Fourth International Conference on. IEEE, 1996, vol. 2, pp. 1133–1136.

       

       

    近期提出了一种batch正则化说话人自适应。

    [14] P. Swietojanski, J. Li, and S. Renals, "Learning hidden unit contributions for unsupervised acoustic model adaptation," IEEE/ACMTransactionsonAudio,Speech, and Language Processing, vol. 24, no. 8, pp. 1450– 1463, 2016.

       

    本文使用了无监督的LIN说话人自适应

    [11]

    使用的LIN层矩阵维数为80*80,该层被三个输入特征共享(原始、deltadelta-delta)。

       

    本文尝试使用以下两种方式进行迭代的说话人自适应:

    • 在迭代时使用上一次迭代的模型生成新标签进行训练。
    • 每次迭代堆叠一个额外的线性输入层(数学上,多个线性层相当于一个隐层)

       

    传统DNN训练方式是segment-wise

       

    实验得出,使用RNN时,Iter(迭代方案)更优;使用tri-gram时,Stack(堆叠)方案更优

  • 相关阅读:
    MyEclipse配置DataBase Explorer
    Eclipse 如何设置注释的模板
    游戏开发技术
    static_cast 与reinterpret_cast
    一个人的成功取决于晚上的8点至10点经典语录必读
    发送消息给线程
    转载ofstream和ifstream详细用法
    Effective STL笔记
    Making your C++ code robust
    TGA文件
  • 原文地址:https://www.cnblogs.com/JarvanWang/p/9152671.html
Copyright © 2011-2022 走看看