This talk presents a new model of the complex form-function relationship found in natural language, with application to language modeling, syntactic dependency parsing, and translation. Linguistic systems avail themselves of two representational extremes: arbitrariness and compositionality. The former, articulated first by de Saussure (1916), addresses the unmotivated nature of the relationship between the form of words and their functions/meanings. The latter, articulated by Frege (1892), is that meaning is determined by a formally transparent composition of meanings of primitive elements (words). A challenge for modeling linguistic systems is that the neat separation of arbitrariness and compositionality is not as neat as this caricature would have it: novel words are created and their meanings are interpreted in predictable ways, and idiomatic expressions show that arbitrariness can extend up to multi-word level that is traditionally the province of compositional operations. Thus, realistic models and learning algorithms for language need to be able to account for both (i.e., the compositional and the arbitrary) aspects of meaning.
Recent work has demonstrated that neural models (e.g., based on convnets, RNNs, etc.) are effective at composing representations of words into phrasal representations. But it is an open question to what extent such models are effective at learning noncompositionlity. To explore this question, we construct word embeddings that capture syntactic and semantic aspects of words by "reading" the sequence of characters that make up a word with bidirectional LSTMs. In addition to being able to learn the regular/"compositional" parts of the form-function relationship (e.g., that the suffix -ly is indicative of being an adverb or that the prefix un- changes the polarity of what follows), results on a variety of tasks (language modeling, dependency parsing, and machine translation) show that the proposed model can learn the much more arbitrary form-function relationship that exists in morphologically opaque words, suggesting that RNNs are an appropriate function class for linguistic learning, capable of capturing both the arbitrary and the regular.
Joint work with Miguel Ballesteros, Wang Ling, and Noah A. Smith.