What do you hear? You probably hear many different sounds.You probably hear a radio, or phone ringing, or other people talking. How do you decipher when someone is talking to you or to someone else? This is one of the many questions that are involved in the planning of a speech recognition system.
It is rather difficult to say what is important and what is not when there are many different possible sources. Therefore, in response, a filter is used in most programs. This filter serves to discern the most important (or most intense) sound waves in the sampling. These sound waves are in a form that the computer cannot really understand. The computer must convert these from an analog form to a digital form. This is accomplished through an HMM (hidden Markov model). With the HMM, neural networks are used in speeding up the processes that the program has to go through. In fact, this setup is one of the fastest ways of decoding speech into text, at the present time. It makes the computer "smart" when deciphering words. Words that are used more often are quickly recognized, while others go through the entire process. This is only one possible way of setting up a system. Some use only the HMM and a couple of other commands.
To increase the speed at which the system recognizes words, eventually becoming
more like continuous speech recognition (CSR),
developers have introduced many different types of searches into the algorithm
governing the execution of the code. Perhaps the best idea is the use of
lattices. This seems to shorten the time it takes for
a complete search and calculation to determine what happens.