Speech Recognition using Neural Networks
completely independent -- when a person speaks quickly, his acoustical patterns become
distorted as well -- but it's a useful simplification to treat them independently.
duration of a known template. Linear warping proved inadequate, however, because utter-
ances can accelerate or decelerate at any time; instead, nonlinear warping was obviously
required. Soon an efficient algorithm known as Dynamic Time Warping was proposed as a
solution to this problem. This algorithm (in some form) is now used in virtually every
speech recognition system, and the problem of temporal variability is considered to be
largely solved
model acoustic variability. Past approaches to speech recognition have fallen into three
main categories:
of prerecorded words (templates), in order to find the best match. This has the
advantage of using perfectly accurate word models; but it also has the disadvan-
tage that the prerecorded templates are fixed, so variations in speech can only be
modeled by using many templates per word, which eventually becomes impracti-
cal.
speech is hand-coded into a system. This has the advantage of explicitly modeling
variations in speech; but unfortunately such expert knowledge is difficult to obtain
and use successfully, so this approach was judged to be impractical, and automatic
learning procedures were sought instead.
cally (e.g., by Hidden Markov Models, or HMMs), using automatic learning proce-
dures. This approach represents the current state of the art. The main disadvantage
of statistical models is that they must make a priori modeling assumptions, which
are liable to be inaccurate, handicapping the system's performance. We will see
that neural networks help to avoid this problem.
electrical engineering, mathematics, physics, psychology, and linguistics as well. Some
researchers are still studying the neurophysiology of the human brain, but much attention is