zoukankan      html  css  js  c++  java
  • S1 and S2 Heart Sound Recognition Using Deep Neural Networks (Literature study)

    Chen, Tien-En, et al. “S1 and S2 heart sound recognition using deep neural networks.” IEEE Transactions on Biomedical Engineering 64.2 (2017): 372-380.

    Obective:
    Focus on the first (S1) and second (S2) heart sound recognition.
    This paper proposes a novel acoustic fingerprinting-based detection framework that applies only supervised classifiers for recognizing S1 and S2.

    Procedure:
    Mel-frequency cepstral coefficients (MFCC) are appllied for feature extraction.
    K-means algorithm is used to divide one heart sound fragment into two groups, then form a supervector.
    The supervector is fed into a deep NN (DNN) classifier to classify S1 and S2.

    This work:
    Recognite S1 and S2 based only on acoustic fingerprinting without incorporating additional duration and interval information;
    Based on deep learning in acoustic modeling.

    Overall S1 and S2 Recognition Architecture


    fig1

    Two parts:
    Offline part: feature extraction
    Online part: DNN calssifier

    Feature extraction

    Mel-frequency cepstral coefficients (MFCC)- 梅尔倒频谱系数

    The MFCC feature extraction procedure comprises six operations:
    (1)Pre-emphasis
    enhances the received signals to compensate for signal distortions.
    (2) windowing
    divides a given signal into a sequence of frames
    (3) fast Fourier transform (FFT)
    for spectral analysis
    (4) Mel-filtering
    it integrates the frequency compositions from one Mel-filter band into one-energy intensity.
    (5) nonlinear transformation
    takes the logarithm of all Mel-filter band intensities.
    (6) discrete cosine transform (DCT)
    converted the transformed intensities into MFCCs

    Using differential parameters to describe temporal characteristics improves pattern recognition performance. So a differential cepstral parameter (差分倒频谱参数)(the slope of a cepstral parameter versus time) representing the dynamic change of the cepstral parameters is proposed.
    Three times of dimensions of original features are obtained by appending velocity(vel) and acceleration(acc) features.

    vel(d,t)=m=1Mvm×[c(d,t+m)c(d,tm)]2m=1Mvm2

    acc(d,t)=m=1Mam×[vel(d,t+m)vel(d,tm)]2m=1Mam2

    where c(d,t)is the dth dimension of the cepstral parameter, and t is the time indicator for the current sound frame; Mv and Ma are window lengths for computing vel and acc coefficients,
    respectively. In this study, Mv=3 and Ma=2.

    K-Means Algorithm

    The K-means algorithm is used to cluster the acoustic features within each heart sound segment into two groups (K = 2). Then a population center vector is computed for each group. These two center vectors are then concatenated to form a supervector. This supervector is the final feature that represents a segment of heart sound. The supervectors are used to build classifiers and perform S1/S2 recognition.

    The main goal of the K-means algorithm is to determine representative data points form large number of data points.
    Such data points are called “population centers.”

    The idea of K-Means pattern classification is using a low number of representative points to represent specific categories for lowering the amount of data and avoiding adverse effects caused by noise.

    The calculation steps of the K-means algorithm:
    1) Initialization:
    Divide training materials vi, i = 1, … , N, randomly into K groups and arbitrarily choose one observation
    from each group as the initial population center μk , k= 1,2, … ,K.
    2) Recursive(递归) calculation:
    i) Let each vi find the nearest population center and assign it to that population center by
    k=argk min d(vi,uk),i=1,...,N
    where d(·,·)denotes the distance measure, the Euclidean distance.
    ii) All vi that belong to the kth group form a new group. Calculate the population center μk again.(这里没有理解)
    iii) If the new groups of the population centers are the same as the original population center set, training
    is completed. Otherwise, new population groups replace the original population center groups. Step 2) is repeated to continue recursive calculations.

    DNN calssifier


    fig2

    Given the correct label,y, the parameters of the DNN classifier can be estimated as follows:
    θ=arg minθ {C(y,y^,;x.θ)+γR(W)+ηρ(A)} (*)
    y^ is the DNN output;
    x is the input data;
    y is the label data.
    where θ denotes the DNN parameter set and C(.) is a cost function.
    C(.) is a cost function:
    cross-entropy is used as the cost function proposed in paper[34].

    R(W) :

    R(W)=l||Wl||F2

    where ||.||F2 denotes the Frobenius norm.

    ρ(A) is the sparsity penalty of the hidden outputs
    η and γ are the controlling coefficients.

    Standard back-propagation algorithm is applied to compute the parameters in the DNN model.

    To overcome the limitation of insufficient training data.

    A pretraining technique that uses unlabeled data is generally adopted.
    A popular pretraining process is to use a deep belief network (DBN) with maximum likelihood estimation.
    A DBN model is formed by stacking a set of restricted Boltzmann machine (RBM) models.

    RBN model [20]
    DBM model

    Use training data to forms a DBN, then a softmax function is added to the top of the DBN model, and the standard back-propagation training with the cost function(*) in is applied to estimate the DNN parameters.

    Experiments

    S1 and S2 were manually segmented and labled.
    KNN,LR, SVM and GMM classifiers were implemented and recognition was tested for comparison.

    setup

    The data collected and divide into groups, S1 and S2 segmentation.

    evluation metrics

    ![fig3](https://img-blog.csdn.net/20180129201016684) prcision, recall, F-measure
    Precidion=TpTp+Fp
    Recall=TpTp+Fn
    Fmeasure=2×Precision×RecallPrecision+Recall
    Accuracy=Tp+TnTp+Tn+Fp+Fn

    Experiment Results


    fig4

    Determining Optimal Feature Configuration and NN Structure:

    A one-layer ANN model comprised of 100 hidden neurons was used as the classifier, and 13-dimensional MFCCs served as the acoustic features. Then extended the original MFCCs from 13 to 26 dimensions (by appending 13 velocity features) and 39 dimensions (by appending 13 velocity and 13 acceleration features). The effectiveness of the K-means algorithm. Fbank features in paper [40][41] with dimensional of 24,168,264 were used here. (?)


    fig6

    Investigate the correlation between classification performance and NN structure (numbers of hidden layers and neurons in each hidden layer).


    fig7

    further tested the effectiveness of pre-training and activation functions


    fig8

    Finally, we examine the weight decay and sparsity penalty (R(W) and ρ(A) in (*), respectively).


    fig9

    Comparison of DNN With Other Classifiers

    The test set data that through the HSAD procedure based on shannon energy was applied to detect heart sound segments. The KNN classifier uses the Euclidean metric for the distance calculation. For the GMM model, eight GMMs were used. For the SVM classifier, the Gaussian radial basis function was used as the kernel function.


    fig10

    fig11

  • 相关阅读:
    Gearman分布式任务处理系统(六)跨多种环境部署
    Gearman分布式任务处理系统(五)版本介绍、安装方法和使用说明
    Gearman分布式任务处理系统(四)Gearman协议
    Gearman分布式任务处理系统(三)libevent介绍
    Java课程笔记_4
    Lesson 13-14 How often do you exercise?
    Lesson 11-12 Men and Women
    Java课程笔记_3
    Lession 9-10 Cell Phone Taboos
    Lession 5-6 When you have a cold
  • 原文地址:https://www.cnblogs.com/siucaan/p/9623223.html
Copyright © 2011-2022 走看看