SpeM: Getting Started

The files can be downloaded here in .zip format (for Windows) and here in .tar.gz format (to unzip: ‘gzip -d spem.tar.gz’, followed by ‘tar xvf spem.tar’).

  • Compile SpeM: g++ SpeM.v0.998.cpp -o spem.exe
  • Run SpeM: spem <config file> <input graph> [parameters]

The parameters can be given on the command line or can be put into the config file.

Example files

  • Config file: the names of the parameters are self-explanatory; where questions may arise, additional information is given in the config file. Download
  • Input graph: 3 graphs created by an automatic phone recogniser on the basis of 3 productions of the phrase: ship inquiry. Download
  • Lexicon: the first column in the lexicon is the orthographic transcription; the second column in the lexicon is the phonemic transcription. (In the example lexicon provided, both columns consists of the phonetic transcription. The output of SpeM will in this case thus consist of the phonemic instead of the orthographic transcription.) Download

Language models

SpeM supports unigram and bigram language models (LMs), although the latter have not been tested. To use language models in SpeM, a couple of steps need to be carried out:

  1. Compile LMConvert: g++ LMConvert.v0.7.cpp -o lmconvert.exe
  2. Create a language model similar to the language model to be found here.
  3. Create a SpeM language model:
    • Unigram LMs: LMConvert <LM filename> uni <lowerBoundUni>
      Creates a filename.TMP if it does not yet exist.
      Default value <lowerBoundUni>: 5.
    • Bigram LMs: LMConvert <LM filename> bi <lowerBoundUni> <lowerBoundBi> <Discount>
      Creates a filename.TMP, filename.BI.TMP, and a filename.BO.TMP if they do not yet exist.
      Default value <lowerBoundBi>: 5; default value <Discount>: 0.5.

    The implementation of <lowerBoundUni>, <lowerBoundBi>, and <Discount> follows the HTK-implementation (Young et al., 2002).
    An example of a SpeM unigram LM can be found here.

  4. Add the path to the language model in the config file.
  5. Add a value for UniGramAlpha in the config file. This value determines the influence of the language model on the overall score (value between 0-1).
  6. Change the value for xGram in the config file to either 1 (unigram LM) or 2 (bigram LM).