Lab: Phrase-based Statistical Machine Translation

Introduction

In this laboratory exercise, we will make use of Moses, an open-source phrase-based statistical machine translation decoder, which was developed at the 2006 JHU Summer Workshop. You will build several complete phrase-based statistical machine translation systems from very small amounts of training data, evaluate their performance, and identify ways that translation quality can be improved. Your resulting systems will be evaluated on test data (released after 1 hour after the start of the exercise). You should work in groups of 2-3.

Questions

It is not necessary to submit answers to the questions asked throughout this instruction page; however, you should take time to consider them carefully.

Preliminaries

Setting up your environment

The Moses decoder will not run on the Solaris desktops. Therefore, you will need to begin by logging into a node on the CLSP Linux cluster (x01-x63). You will then need to set up your environment to run the decoder and training scripts by issuing the following command (this will work under the bash shell, but if you are using a different shell you may have to execute a different command):
     source /export/ws06osmt/data/mtlab07/config-env.mt
You can verify this has worked properly by typing which moses and verifying its output:
[x32:~]$ which moses
/export/ws06osmt/data/mtlab07/scripts/moses

Setting up a workplace

Create a directory where you will work on this lab and copy the training and evaluation data to it:
     mkdir MT-LAB
     cd MT-LAB
     cp -r /export/ws06osmt/data/mtlab07/data/* .

Building an English-English machine translation system

We will start by building a system to translate from Early Modern English to Modern English by using two different English translations of the Bible as a resource, the King James Version (1611) and a more recent version (circa 1890).
  1. Train the system:
    lab-train train.kvj-modern
    This step will take approximately 5 minutes. The training process involves using the IBM models 1, 3, and 4 to generate word alignments, from which phrases are then extracted. The phrases are gathered and statistics about the phrase translation probabilities are computed.
  2. Translate the test set:
    lab-translate train.kvj-modern
    This will translate the unseen (held-out) test data in train.kvj-modern/test.foreign using the model constructed in step 1. The resulting file will be train.kvj-modern/test.foreign.trans. Does the translation sound more modern?
  3. Evaluate the quality of the translation:
    lab-eval train.kvj-modern
    Results are reported in BLEU, which is a metric that ranges from 0 to 1 (commonly written as a perentage) and has been found to correlate strongly with human judgements of translation quality. BLEU is an imperfect approximation of translation quality, manual human evaluation should be attempted to whatever extent possible!

Building a Spanish-English translation system

The same steps above can be repeated with the parameter train.es-en. How does the BLEU score you obtain for English-English translation compare to the BLEU score you obtain for Spanish-English translation? What are the subjective differences? Can you see a pattern in the types of errors the translation system is making?

Comparison with an out-of-domain Spanish-English translation system

Statistical machine translation systems that often perform well in one domain may perform quite poorly in another. The directory model.project-syndicate.es-en contains a model built from an order of magnitude more training data; however, the training data consists mostly of news commentary (from Project Syndicate). You can evaluate how well this system performs on Bible translation with the following:
  1. Translate the test set:
    lab-translate model.project-syndicate.es-en
  2. Evaluate the quality of the translation:
    lab-eval model.project-syndicate.es-en
    How does this compare to the in-domain system from the previous section? Is any aspect of the translations of the out-of-domain system better?

Improving the system

Now you should work to try to improve your Spanish-English translation system. This will be the system used for the competitive evaluation. Refer to the description of the lab tools and data formats for an explanation of how to make changes to the training and/or test data. Here is a list of some ideas to get you started, but be as creative as you want:

Competitive evaluation

After one hour, you will be able to download the evaluation data here. Use your best system (as evaluated on the development data only) to translate it. Have one member of your group email me (see blackboard) with the path to your output file.

Rules

Have fun!


Lab tools and data formats

This section describes the operation of the tools used in the lab and the data layout required. These tools are thin wrappers around the Moses tools, so it may be instructive to look at the sources (see use which lab-tool to find the path to the script).

Data format and layout

Each tool used in the lab takes a parameter that specifies a set of training data, the resulting model, and a set of development (dev-test) data. You are provided with four example systems (train.kvj-modern, train.es-en, train.tok.es-en, and model.project-syndicate.es-en), and you can use these as the starting point to create new systems. Each directory must minimally containing training data and evaluation data:

Once a system has been trained, the directory will contain a model subdirectory which contains the moses.ini configuration file, which contains references to all the models generated in the training step.

lab-train

The lab-train tool takes a single parameter that specifies a directory containing training data and uses this to build a model. If a model already exists, the command will fail. To clear an existing model, you can use the lab-clean command.

lab-translate

The lab-translate tool translates the contentes of test.foreign using the model and parameters in model/moses.ini and generates test.foreign.trans. To invoke Moses directly (to translate a different file or to translate command-line input), use the following:
      moses -f SYSTEM-DIR/model/moses.ini
Moses reads input from STDIN in the source language and writes translations to STDOUT.

lab-eval

The lab-eval computes the BLEU score of the translation output in test.foreign.trans using test.english as a reference.