zoukankan      html  css  js  c++  java
  • 创建自己的语音识别系统

    Data preparation

    Audio data

    自己创建数据集:

    10个不同的说话人

    每个人说10句话

    每句话包含3个词

    总共300个词,(数字0~9)

    Task

    kaldi-trunk/egs/digits创建digits_audio 文件夹,然后digits_audio,再创建train and test两个文件夹。

    以说话人的ID命名文件夹,存放该说话人的数据,选出1个说话人的数据作为测试数据,其它9人作为训练数据。

    Acoustic data

    创建一些文本文件与音频数据关联。每个文件包含很多字符串,这些字符串需要被排序的。当遇到排序问题时,可以用checking (utils/validate_data_dir.sh) and fixing (utils/fix_data_dir.sh) ,保证有序,另外,将utils文件夹添加到工程目录里面。

    Task

    kaldi-trunk/egs/digits ,创建一个data文件夹,然后再创建test和train两个子文件夹再data里面。

    a.) spk2gender 

    Pattern: <speakerID> <gender>

    cristine f
    dad m
    josh m
    july f
    # and so on...

    b.) wav.scp 

    Pattern: <uterranceID> <full_path_to_audio_file>

    dad_4_4_2 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/dad/4_4_2.wav
    july_1_2_5 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/1_2_5.wav
    july_6_8_3 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/6_8_3.wav
    # and so on...

    c.) text 

    Pattern: <uterranceID> <text_transcription>

    dad_4_4_2 four four two
    july_1_2_5 one two five
    july_6_8_3 six eight three
    # and so on...

    d.) utt2spk 

    Pattern: <uterranceID> <speakerID>

    dad_4_4_2 dad
    july_1_2_5 july
    july_6_8_3 july
    # and so on...


    e.) corpus.txt

    Pattern: <text_transcription>

    one two five
    six eight three
    four four two
    # and so on...

    每个文件对应1个发音,包含3个数字,因此100个发音,对应100行。

    Language data

    Task

    kaldi-trunk/egs/digits/data/local,创建dict文件夹。

     a.) lexicon.txt 

     'phone transcriptions' (taken from /egs/voxforge).  发音词典

    Pattern: <word> <phone 1> <phone 2> ...

    !SIL sil
    <UNK> spn
    eight ey t
    five f ay v
    four f ao r
    nine n ay n
    one hh w ah n
    one w ah n
    seven s eh v ah n
    six s ih k s
    three th r iy
    two t uw
    zero z ih r ow
    zero z iy r ow

    b.) nonsilence_phones.txt 

    This file lists nonsilence phones that are present in your project.

    Pattern: <phone>

    ah
    ao
    ay
    eh
    ey
    f
    hh
    ih
    iy
    k
    n
    ow
    r
    s
    t
    th
    uw
    w
    v
    z
    

    c.) silence_phones.txt 
    This file lists silence phones.

    Pattern: <phone>

    sil
    spn
    

    d.) optional_silence.txt 
    This file lists optional silence phones.

    Pattern: <phone>

    sil

    Project finalization

    Tools attachment

    Task

    From kaldi-trunk/egs/wsj/s5 copy two folders (with the whole content) - utils and steps - and put them in your kaldi-trunk/egs/digits directory. You can also create links to these directories. You may find such links in, for example, kaldi-trunk/egs/voxforge/s5.

    拷贝wsj/s5里面两文件夹utils and steps到本工程里

    Scoring script

    This script will help you to get decoding results.

    SRILM installation

    You also need to install language modelling toolkit that is used in my example - SRI Language Modeling Toolkit (SRILM).

    安装语言模型工具包SRILM

    Task

    For detailed installation instructions go to kaldi-trunk/tools/install_srilm.sh (read all comments inside).

    安装脚本kaldi-trunk/tools/install_srilm.sh 

    Configuration files

    It is not necessary to create configuration files but it can be a good habit for future.

    Task

    In kaldi-trunk/egs/digits create a folder conf. Inside kaldi-trunk/egs/digits/conf create two files (for some configuration modifications in decoding and mfcc feature extraction processes - taken from /egs/voxforge):

    修改解码配置和mfcc特征提取配置文件

    a.) decode.config 

    first_beam=10.0
    beam=13.0
    lattice_beam=6.0
    

    b.) mfcc.conf 

    --use-energy=false


    Running scripts creation

    两种训练方法:1.单音素训练 2.简单的3音素训练。从解码结果可以看到这两种方法的区别。

    Task

    In kaldi-trunk/egs/digits directory create 3 scripts:

    a.) cmd.sh 

    本地机器跑

    b.) path.sh 

    添加路径

    c.) run.sh

     

    Getting results

    Now all you have to do is to run run.sh script. 

    go to newly made kaldi-trunk/egs/digits/exp. You may notice there folders with mono and tri1 results as well - directories structure are the same.

    Go to mono/decode directory. Here you may find result files (named in a wer_{number} way)

    Summary

    This is just an example. The point of this short tutorial is to show you how to create 'anything' in Kaldi and to get a better understanding of how to think while using this toolkit. Personally I started with looking for tutorials made by the Kaldi authors/developers. After succesful Kaldi installation I launched some example scripts (Yesno, Voxforge, LibriSpeech - they are relatively easy and have free acoustic/language data to download - I used these three as a base for my own scripts).

    Make sure you follow http://kaldi-asr.org/- official project website. There are two very useful sections for beginners inside: 
    a.) Kaldi tutorial - almost 'step by step' tutorial on how to set up an ASR system; up to some point this can be done without RM dataset. It is good to read it, 
    b.) Data preparation - very detailed explaination of how to use your own data in Kaldi.

    More useful links about Kaldi I found: 
    https://sites.google.com/site/dpovey/kaldi-lectures - Kaldi lectures created by the main author 
    http://www.superlectures.com/icassp2011/category.php?lang=en&id=131 - similar; video version 
    http://www.diplomovaprace.cz/133/thesis_oplatek.pdf - some master diploma thesis about speech recognition using Kaldi

    This is all from my side. Good luck!

  • 相关阅读:
    Scrum Works 1.84安装
    使用Sandcastle Styles 来生成VS2008的帮助文档.
    NDoc 用户指南(转)
    第一章 C#语言基础(C# Language Elements)
    SQL Server 2005 中删除重复记录
    SDE 远程连接
    C# 按钮美化技巧
    SOP 中的 Service
    C# DateTime赋值为null
    C# WebBrowser显示html字符串
  • 原文地址:https://www.cnblogs.com/welen/p/7495697.html
Copyright © 2011-2022 走看看