zoukankan      html  css  js  c++  java
  • Kaldi + CNN for Speech Enhancement

    (一)网络结构

    input-> C1 –>softmax –>S2 –> C3 –>softmax –>S4 –>FC5 –> softmax –> FC6

    (二)数据预处理

    1.归一化:提取音频.wav的对数频谱值作为特征值(d=129),并归一化之;

    2.扩帧:对每一帧左右各扩5帧,扩帧后的维度 D =d*(5*2+1) =d*11

     

    (三)初始化各网络层

    1. conv1

    in_dim = D

    stride1 = d

    num_patch1 = 1+(stride1 - patch1_dim)/patch1_step

    out_dim = num_filters1 *num_patch1

    2. mpool1

    in_dim = out_dim (1.④)

    num_pool1 = num_patch1 / pool1_size

    out_dim = num_filters1 *num_pool1

    3.conv2

    in_dim = out_dim (2.③)

    stride2 = num_filters1 *num_pool1

    patch2_dim = patch2_dim *num_filters1

    patch2_step = num_filters1

    num_patch2 = 1+(stride2 – patch2_dim)/patch2_step

    out_dim = num_filters2 *num_patch2

    4.mpool2

    in_dim = out_dim (3.⑥)

    num_pool2 = num_patch2 / pool2_size

    out_dim = num_filters2 *num_pool2

    5.FC1

    in_dim = out_dim = hidden_dim

    6.FC2

    in_dim = hidden_dim

    out_dim = d

  • 相关阅读:
    JAVA-JDBC
    如何优雅地拼SQL的in子句
    Groovy 语言尝鲜
    小而美的CNC机器
    GCode软件和资料
    基于PC的运动控制
    CAD/CAM软件
    工控硬件
    数控系统
    Visual Studio 2019 Community 版离线注册
  • 原文地址:https://www.cnblogs.com/atmacmer/p/6875447.html
Copyright © 2011-2022 走看看