Speech Recognition using Neural Networks
tion throughout our lives. It comes so naturally to us that we don't realize how complex a
phenomenon speech is. The human vocal tract and articulators are biological organs with
nonlinear properties, whose operation is not just under conscious control but also affected
by factors ranging from gender to upbringing to emotional state. As a result, vocalizations
can vary widely in terms of their accent, pronunciation, articulation, roughness, nasality,
pitch, volume, and speed; moreover, during transmission, our irregular speech patterns can
be further distorted by background noise and echoes, as well as electrical characteristics (if
telephones or other electronic equipment are used). All these sources of variability make
speech recognition, even more than speech generation, a very complex problem.
and pointing devices. A speech interface would support many valuable applications -- for
example, telephone directory assistance, spoken database querying for novice users, "hands-
busy" applications in medicine or fieldwork, office dictation devices, or even automatic
voice translation into foreign languages. Such tantalizing applications have motivated
research in automatic speech recognition since the 1950's. Great progress has been made so
far, especially since the 1970's, using a series of engineered approaches that include tem-
plate matching, knowledge engineering, and statistical modeling. Yet computers are still
nowhere near the level of human performance at speech recognition, and it appears that fur-
ther significant advances will require some new insights.
cally different computational paradigm. While conventional computers use a very fast &
complex central processor with explicit program instructions and locally addressable mem-
ory, by contrast the human brain uses a massively parallel collection of slow & simple
processing elements (neurons), densely connected by weights (synapses) whose strengths
are modified with experience, directly supporting the integration of multiple constraints, and
providing a distributed form of associative memory.