Speech recognition is very complicated. Complex programs use Dynamic Time Warping or Hidden Markov Models for recognition of words or continuous speech (speaker independent). It should be easier to do this for 1 speaker and one word commands but it is still very difficult task. This includes creating the database of words (their parameters based on LPCC analysis), training of your algorithm,...
as pycoucou says try to google something up.
|