Events

Variability in speech particularly as a consequence of production rate is still a great challenge in the development of automatic speech recognition (ASR) systems that can perform well with minimal constraints. Articulatory Phonology (AP) provides a unified framework for understanding the resulting acoustic consequences of changes in speech production due to gestural overlap and gestural reduction that are often reported as assimilations, insertions, deletions and substitutions. In this talk, I will discuss the progress we have made in developing a speech inversion system that is based on a computational model of AP, and the ability of the speech inversion system to extract vocal tract constriction variables and, hence, gestures from speech spoken at different speaking rates. We have conducted several studies to show that augmenting acoustic features with such articulatory information improves robustness of ASR systems in noise. An additional goal is to provide a framework that models in a seamless way speech variability due to coarticulation and lenition. Our current focus is to determine if our speech inversion system can “uncover” gestures not apparent from the physical signal. If so, such information if properly modeled should allow for better performing ASR systems and the relaxation of restrictions currently needed for reasonable performance.