The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call "translation by pattern matching", which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main open problem we address is much harder. Many translation models are based on the translation of discontiguous substrings. The best pattern matching algorithm for these models is much too slow, taking several minutes per sentence. We develop new algorithms that reduce empirical computation time by two orders of magnitude for these models, making translation by pattern matching widely applicable. We use these algorithms to build a model that is two orders of magnitude larger than the current state of the art and substantially outperforms a strong competitor in Chinese-English translation. We show that a conventional representation of this model would be impractical. Our experiments shed light on some interesting properties of the underlying model. The dissertation also includes the most comprehensive contemporary survey of statistical machine translation.