The research on automatic speech recognition aims to give the machine capabilities similar to humans to communicate in natural spoken languages, and such research is of great interest from both the application and the research point of view. This chapter discusses the fundamentals of speech production and speech knowledge, numerous techniques used in speech recognition systems, some successful speech recognition systems, and some recent advances in speech recognition research, such as the application of artificial neural network models and a special case of Hidden Markov models. The problem of speech recognition is approached in two ways: using models based on speech production, and using models based on speech perception. The chapter illustrates a combination of an ear model and multi-layer networks that makes possible an effective generalization among speakers in coding vowels. In addition, it also suggests that the use of speech knowledge organized as morphological properties is robust enough to handle inter- and intra-speaker variations. By learning the ways to allocate the degrees of evidence to articulatory features, it is possible to estimate normalized values for the place and manner of articulation, which appear to be highly consistent with qualitative expectations based on speech knowledge. The effective learning and good generalizations can be obtained using a limited number of speakers, in analogy with what humans do. Speech coders that create degrees of evidence of phonetic features can be used for fast lexical access, to recognize phonemes in new languages with limited training, to constrain the search for the interpretation of a sentence.
Perceptual Models for Automatic Speech Recognition Systems
Contributo in volume
Academic Press, San Diego, USA
Advances in Computers, edited by Yovits M.C., pp. 99–173. San Diego: Academic Press, 1990