创建自己的语音识别系统

zoukankan html css js c++ java

创建自己的语音识别系统
Data preparation

Audio data

自己创建数据集：

10个不同的说话人

每个人说10句话

每句话包含3个词

总共300个词，(数字0～9)

Task

kaldi-trunk/egs/digits创建digits_audio 文件夹，然后digits_audio,再创建train and test两个文件夹。

以说话人的ID命名文件夹，存放该说话人的数据，选出1个说话人的数据作为测试数据，其它9人作为训练数据。

Acoustic data

创建一些文本文件与音频数据关联。每个文件包含很多字符串，这些字符串需要被排序的。当遇到排序问题时，可以用checking (utils/validate_data_dir.sh) and fixing (utils/fix_data_dir.sh) ，保证有序，另外，将utils文件夹添加到工程目录里面。

Task

kaldi-trunk/egs/digits ，创建一个data文件夹，然后再创建test和train两个子文件夹再data里面。

a.) spk2gender

Pattern: <speakerID> <gender>
```
cristine f
dad m
josh m
july f
# and so on...

b.) wav.scp 
```
Pattern: <uterranceID> <full_path_to_audio_file>
```
dad_4_4_2 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/dad/4_4_2.wav
july_1_2_5 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/1_2_5.wav
july_6_8_3 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/6_8_3.wav
# and so on...

c.) text 
```
Pattern: <uterranceID> <text_transcription>
```
dad_4_4_2 four four two
july_1_2_5 one two five
july_6_8_3 six eight three
# and so on...

d.) utt2spk 
```
Pattern: <uterranceID> <speakerID>
```
dad_4_4_2 dad
july_1_2_5 july
july_6_8_3 july
# and so on...


e.) corpus.txt
```
Pattern: <text_transcription>
```
one two five
six eight three
four four two
# and so on...

每个文件对应1个发音，包含3个数字，因此100个发音，对应100行。
```
Language data

Task

kaldi-trunk/egs/digits/data/local，创建dict文件夹。

a.) lexicon.txt

'phone transcriptions' (taken from /egs/voxforge). 发音词典

Pattern: <word> <phone 1> <phone 2> ...
```
!SIL sil
<UNK> spn
eight ey t
five f ay v
four f ao r
nine n ay n
one hh w ah n
one w ah n
seven s eh v ah n
six s ih k s
three th r iy
two t uw
zero z ih r ow
zero z iy r ow

b.) nonsilence_phones.txt 
```
This file lists nonsilence phones that are present in your project.

Pattern: <phone>
```
ah
ao
ay
eh
ey
f
hh
ih
iy
k
n
ow
r
s
t
th
uw
w
v
z
```
c.) silence_phones.txt
This file lists silence phones.

Pattern: <phone>
```
sil
spn
```
d.) optional_silence.txt
This file lists optional silence phones.

Pattern: <phone>
```
sil
```
Project finalization

Tools attachment

Task

From kaldi-trunk/egs/wsj/s5 copy two folders (with the whole content) - utils and steps - and put them in your kaldi-trunk/egs/digits directory. You can also create links to these directories. You may find such links in, for example, kaldi-trunk/egs/voxforge/s5.

拷贝wsj/s5里面两文件夹utils and steps到本工程里

Scoring script

This script will help you to get decoding results.

SRILM installation

You also need to install language modelling toolkit that is used in my example - SRI Language Modeling Toolkit (SRILM).

安装语言模型工具包SRILM

Task

For detailed installation instructions go to kaldi-trunk/tools/install_srilm.sh (read all comments inside).

安装脚本kaldi-trunk/tools/install_srilm.sh

Configuration files

It is not necessary to create configuration files but it can be a good habit for future.

Task

In kaldi-trunk/egs/digits create a folder conf. Inside kaldi-trunk/egs/digits/conf create two files (for some configuration modifications in decoding and mfcc feature extraction processes - taken from /egs/voxforge):

修改解码配置和mfcc特征提取配置文件

a.) decode.config
```
first_beam=10.0
beam=13.0
lattice_beam=6.0
```
b.) mfcc.conf
```
--use-energy=false
```
Running scripts creation

两种训练方法：1.单音素训练 2.简单的3音素训练。从解码结果可以看到这两种方法的区别。

Task

In kaldi-trunk/egs/digits directory create 3 scripts:

a.) cmd.sh

本地机器跑

b.) path.sh

添加路径

c.) run.sh

Getting results

Now all you have to do is to run run.sh script.

go to newly made kaldi-trunk/egs/digits/exp. You may notice there folders with mono and tri1 results as well - directories structure are the same.

Go to mono/decode directory. Here you may find result files (named in a wer_{number} way)

Summary

This is just an example. The point of this short tutorial is to show you how to create 'anything' in Kaldi and to get a better understanding of how to think while using this toolkit. Personally I started with looking for tutorials made by the Kaldi authors/developers. After succesful Kaldi installation I launched some example scripts (Yesno, Voxforge, LibriSpeech - they are relatively easy and have free acoustic/language data to download - I used these three as a base for my own scripts).

Make sure you follow http://kaldi-asr.org/- official project website. There are two very useful sections for beginners inside:
a.) Kaldi tutorial - almost 'step by step' tutorial on how to set up an ASR system; up to some point this can be done without RM dataset. It is good to read it,
b.) Data preparation - very detailed explaination of how to use your own data in Kaldi.

More useful links about Kaldi I found:
https://sites.google.com/site/dpovey/kaldi-lectures - Kaldi lectures created by the main author
http://www.superlectures.com/icassp2011/category.php?lang=en&id=131 - similar; video version
http://www.diplomovaprace.cz/133/thesis_oplatek.pdf - some master diploma thesis about speech recognition using Kaldi

This is all from my side. Good luck!
查看全文

相关阅读:
3.14 逆向班级在线答疑一周
 软件破解逆向安全③-FPS游戏自瞄内存逆向分析-小白入门必备免费课程
 C/C++ 外部特征码寻址-hook终结者2过CRC检测
 Windows二进制逆向安全-入门到深入学习框架综合梳理
 软件破解逆向安全②-基础游戏内存逆向分析-学习及其课程表
 数组 a+1 &a+1 的区别
 变量到底是什么玩意
 数据类型的本质是什么
 内存映射+远线程调用游戏CALL
用到的结构

原文地址：https://www.cnblogs.com/welen/p/7495697.html

创建自己的语音识别系统

Data preparation

Audio data

Task

Acoustic data

Task

Language data

Task

Project finalization

Tools attachment

Task

Scoring script

SRILM installation

Task

Configuration files

Task

Running scripts creation

Task

Getting results

Summary