python 2.7 - Building Speech Dataset for LSTM binary classification -

- June 15, 2015

i'm trying binary lstm classification using theano. have gone through example code want build own.

i have small set of "hello" & "goodbye" recordings using. preprocess these extracting mfcc features them , saving these features in text file. have 20 speech files(10 each) , generating text file each word, 20 text files contains mfcc features. each file 13x56 matrix.

my problem is: how use text file train lstm?

i relatively new this. have gone through literature on not found understanding of concept.

any simpler way using lstm's welcome.

there many existing implementation example tensorflow implementation, kaldi-focused implementation scripts, better check them first.

theano low-level, might try keras instead, described in tutorial. can run tutorial "as is" understand how things goes.

then, need prepare dataset. need turn data sequences of data frames , every data frame in sequence need assign output label.

keras supports 2 types of rnns - layers returning sequences , layers returning simple values. can experiment both, in code use return_sequences=true or return_sequences=false

to train sequences can assign dummy label frames except last 1 can assign label of word want recognize. need place input , output labels arrays. be:

x = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]  y = [[0,0,...,1], [0,0,....,2]]

in x every element vector of 13 floats. in y every element number - 0 intermediate frames , word id final frame.

to train labels need place input , output labels arrays , output array simpler. data be:

x = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]  y = [[0,0,1], [0,1,0]]

note output vectorized (np_utils.to_categorical) turn vectors instead of numbers.

then create network architecture. can have 13 floats input, vector output. in middle might have 1 connected layer followed 1 lstm layer. not use big layers, start small ones.

then feed dataset model.fit , trains model. can estimate model quality on heldout set after training.

you have problem convergence since have 20 examples. need way more examples, preferably thousands train lstm, able use small models.

Search This Blog

Stadnd

python 2.7 - Building Speech Dataset for LSTM binary classification -

Comments

Post a Comment

Popular posts from this blog

Capture and play voice with Asterisk ARI -

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -

c++ - Can not find the "fiostream.h" file -