Example-based large vocabulary recognition

TitleExample-based large vocabulary recognition
Publication TypePresentation
Year of Publication2006
Conference NameSummer Meeting on Corpus-based Research
AuthorsVan Compernolle, Dirk
PublisherNederlandse Vereniging voor Fonetische Wetenschappen
Conference LocationNijmegen, The Netherlands

Hidden Markov Models(HMM) have dominated speech recognition for over two decades. HMMs are an embeddiment of a beads on a string model in which a sentence is a sequence of words, a word a sequence of phonemes and a phoneme a sequence of states. An HMM-state (in the acoustic model) models a sub-phonetic speech fragment as a short-time stationary event. HMMs have great advantages: the concept is straightforward and the parameters in the model are trained from data available in large databases. Moreover HMMs have proven to be extremely scaleable: larger database allow for more detailed models with more parameters while more powerful CPUs make it possible to use these more detailed models in real-time systems. The success of HMMs has been the single most important driving force in the use of large databases and statistical techniques in the field of speech and language.

Nevertheless HMMs are far from ideal in their speech modeling concept. Especially the short-time stationarity assumption is contradictory to the nature of speech which often looks more like a concatenation of transients than a concatenation of stationary segments. In order to overcome these fundamental weaknesses a new line of speeech recognition systems is currently being developed that avoids the modeling step all together and does recognition straight from the data by the application of template matching. This avoids the step of imperfect modeling and at the same time itis in line with recent psycholinguistic findings that claim that many individual traces of speech fragments are permanently stored in memory.

Template based systems require that the full database is accessible at recognition time; which thanks to further increases in hardware performance is almost within reach. However, template based recognition has fundamental weaknesses as well: it relies on the score of one or a few examples only to compute a distance score.

In this presentation we will compare the pro's and con's of HMM and template based recognition. Both of the them could not exist without the availability of large corpora of speech. However, the way in which these corpora are used in an actual recognition system are drastically different for both methods.