How to use context for phonetic learning and perception from naturalistic speech

Kasia Hitczenko

Infants learn about the sounds of their language and adults process the sounds they hear, even though sound categories often overlap in their acoustics. This dissertation is about how contextual information (e.g. who spoke the sound and what the neighboring sounds were) can help in phonetic learning and speech perception. The role of contextual information in these tasks is well-studied, but almost exclusively using simplified, controlled lab speech data. In this dissertation, we study naturalistic speech of the type that listeners primarily hear.

The dissertation centers around two main theories about how context could be used: top-down information accounts, which argue that listeners use context to predict which sound will be produced, and normalization accounts, which argue that listeners compensate for the fact that the same sound is produced differently in different contexts by factoring out this systematic context-dependent variability from the acoustics. These ideas have been somewhat conflated in past research, and have rarely been tested on naturalistic speech. We start by implementing top-down and normalization accounts separately and evaluating their relative efficacy on spontaneous speech, using the test case of Japanese vowel length. We find that top-down information strategies are effective even on spontaneous speech. Surprisingly, we find that normalization is ineffective on spontaneous speech, in contrast to what has been found on lab speech. We, then, provide analyses showing that when there are systematic regularities in which contexts different sounds occur in - which are common in naturalistic speech, but generally controlled for in lab speech - normalization can actually increase category overlap rather than decrease it. Finally, we present a new proposal for how infants might learn which dimensions of their language are contrastive that takes advantage of these systematic regularities in which contexts different sounds occur in. We propose that infants might learn that a particular dimension of their language is contrastive, by tracking the acoustic distribution of speech sounds across contexts, and learning that a dimension is contrastive when the shape changes substantially across contexts. We show that this learning account makes critical predictions that hold true in naturalistic speech, and is one of the first accounts that can qualitatively explain why infants learn what they do.

The results in this dissertation teach us about how listeners might use context to overcome variability in their input. More generally, they reveal that results from lab speech do not necessarily generalize to spontaneous speech, and that using realistic data matters. Turning to spontaneous speech not only gives us a more realistic view of language learning and processing, but can actually help us decide between different theories that all have support from lab speech and, therefore, can complement work on lab data well.