FineTracker: Documentation

This page provides more information on:

  • The format of the input files.
  • The format of the lexicon files.
  • The parameters and the config files.
  • Representation of recognised words in the output.
  • How to include your own Distance measure.
  • Language models.

Input files

The input for Fine-Tracker should look as follows:

“– <file name>”
<value 1> <value 2> <value 3> … <value N>
<value 1> <value 2> <value 3> … <value N>
<value 1> <value 2> <value 3> … <value N>

<value 1> <value 2> <value 3> … <value N>
<empty line>

The <file name> will also appear in the output file indicating which output belongs to which input. Each line following the filename denotes a vector (e.g. denoting a 10 ms frame). Each element in the vector (separated by a whitespace) gives a value between (and including) 0 and 1.

Lexicon files

The transcription of a word in the lexicon used by Fine-Tracker should look as follows:

“– <orthographic transcription of word> : <variant number>”
<(phone) label 1> <value 1> <value 2> <value 3> … <value N>
<(phone) label 2> <value 1> <value 2> <value 3> … <value N>
<(phone) label 3> <value 1> <value 2> <value 3> … <value N>

<(phone) label X> <value 1> <value 2> <value 3> … <value N>

Each pronunciation variant in the lexicon has its own transcription and <variant number>. The lines following the orthographic transcription of the word describe the feature vectors: 1 vector per line, and values are separated by a whitespace. A value can take any (real) value between 0 and 1, where 1 is feature is present and 0 means feature is not present. It is also possible to use a feature value ‘NaN’ which means Not applicable. For instance, in the description of a vowel the feature “plosive” is irrelevant, if the value for “plosive” is set to “NaN” this feature will be ignored during the match. Finally, each line starts with a (phone) label. This (phone) label is used to print word-initial cohorts, it is not used in any other part of the code.

It is not possible to have multiple words with exactly the same pronunciation in the lexicon. This is a decision that has been made to speed up development. If two words with the same pronunciation are encountered in the lexicon, a warning message is shown and the application continues. In that case, only the first word is used for the algorithm.

Parameters

The default config file is an XML-file which lists all the parameters, the input file to be tested, and the lexicon to be used. All values can and should be adapted and tuned for own use. The parameters in the config file for version 1.0 are:

  • Input: the input file.
  • Lexicon: the lexicon file.
  • Word entrance penalty: for each hypothesised word a score is added to the total score.
  • Step-in-lexicon penalty: a penalty associated with making a ‘step’ in the lexicon, but not in the input. This results in a lexicon feature vector being inserted. The penalty is added to the total score.
  • Step-in-input penalty: a penalty associated with making a ‘step’ in the input but not in the lexicon. Note: if two instantiations of the same word have a different number of input frames but both map perfectly on the lexical representation, the longer instantiation will have a higher word score than the shorter version, due to the many-to-one mapping. The penalty is added to the score.
  • Word not finished penalty: at the end of the input, all cohorts that do not correspond to words get a score added to the total score.
  • Distance weight: this weight determines the relative weight of the above penalties and the distance measure.
  • Distance calculation: the metric used to calculate the distance between the input vector and the lexicon vector. For more information, see below.
  • Output N best: the number of hypotheses (Nbest list) printed to the output.
  • Output each N iterations: an Nbest list is printed every N input frames.
  • Show intermediate results: ‘true’ if results for every N iterations should be printed; ‘no’ only prints the results at the end of the input.
  • Vector length: the number of elements in each vector. The number should be equal for input and lexicon vectors.
  • Silence vector: the feature vector that describes a silence.
  • Maximum number of hypotheses: the maximum number of hypotheses kept in memory during decoding.
  • Show frame counts: shows the number of frames in the input that were matched onto each phoneme in the recognised word (see below for more information).
  • Show max N digits precision: the precision of the word and path scores.

Version 1.2 has a few additional parameters:

  • Language model: the language model (LM) file (see also below).
  • Language model weight: this weight determines the relative weight of the LM and the acoustic score.
  • Use Ngram length: 0= use a zerogram LM (is equal to not using an LM at all); 1= use a unigram LM; 2= use a bigram LM.
  • Push LM scores: true= the LM scores are pushed forwards in the lexical tree; false= the LM scores are only applied once a word has been fully recognised.
  • Silence penalty: a penalty on recognising silence.
  • Simultaneous start: a penalty associated with making a step in the input and the lexical tree. This penalty is halved for each new input frame. This parameter is associated with the step-in-input and step-in-lexicon penalties.
  • Simultaneous end: if the simultaneous start parameter has reached the value of simultaneous end or lower, the simultaneous start penalty is no longer applied.
  • (Display) LM penalties: true= prints the LM scores for each recognised word; false= does not print the LM scores.
  • (Display) Only unique results: true= silences are removed from the recognised sequences; false= silences are kept in the recognised sequences, this may result in numerous almost identical recognised sequences, only different in the position of the silence(s).

Parameter: Show frame counts

The setting “show frame counts” in the config file shows the number of frames in the input that were matched onto each phoneme in the recognised word. To show the frame counts, set the value to “true” (letter case is ignored), and to any other value to hide them.

The output will look like [{n n n n} {n n n n}]. The curly brackets distinguish words, and each n is the number of frames matched onto each phoneme. This number can also be 0, if a phoneme has been skipped in the lexicon.

There is a special case: the first set of curly brackets might have an exclamation mark in them (like [{1!}]), which means that the first frame matched onto the root of the lexicon, or in other words if a frame could not be matched onto a word in the lexicon. This usually only happens at the first few iterations of the algorithm.

Command line parameters

Some parameters specified in the config file can be overridden by adding parameters to the command line. The parameters that can be overridden are:

Word entrance penalty, Step-in-lexicon penalty, Step-in-input penalty, Word not finished penalty, Distance weight.

The command line arguments all have two versions: a long one (e.g. ‘–stepinlexiconpenalty‘) and a short one (e.g. ‘-lp‘). The short one is provided for ease of use when manually entering parameters. However, when creating scripts the long one is preferred: it makes it easier to see what each argument refers to. An overview of the long and short versions of the parameters is given when running Fine-Tracker without any arguments:

java –jar ftracker.jar

Parameter settings

Parameter settings have an influence on the running time of the program. This is most obvious for the parameter that controls the maximum number of hypotheses, which controls the trade-off between speed and accuracy. However, setting penalties, such as the step-in-input and step-in-lexicon, at really low values can also greatly increase running time. This has to do with the fact that some loops depend on scores reaching a certain treshold. The lower the penalties, the slower scores increase, the longer it will take to reach the treshold.

Output

Due to an implementation detail, words that are recognised entirely are not marked as such until either:

  • a following word is started to be recognised;
  • the end of the input has been reached and all words are finalised.

This has no effect on the scores of hypotheses, and therefore not on the algorithm: it is purely a display issue.

Distance measure

An option is provided to programme your own distance calculation. There are however numerous things that can go wrong when dynamically compiling and loading code, in addition to the normal programming problems you might encounter. Fine-Tracker will try to provide you with the details you need to fix the problems. A few explanations for so-called ‘Exception’s you might encounter:

  • ClassCastException: the class you created is not a subclass of the DistanceCalculation class. Make sure to put ‘implements DistanceCalculation’ in the declaration of your class.
  • IllegalAccessException: the default constructor (the constructor without parameters) of the class you created cannot be marked private. Make sure it is marked default, protected or public.
  • InstantiationException: the class you created is not a concrete class. You must provide an implementation for the method ‘calcDistance(double[], double[])’, and the class must not be declared abstract. Also there must exist a default constructor (i.e. a constructor without parameters).

Language models

Fine-Tracker version 1.0 does not support language models; version 1.2 supports uni- and bigram language models.

After unzipping and installing ftracker_v.1.2, two executables will appear, the ftracker.jar software and a tool simple_lm.jar, which allows you to build your own uni- and bigram language model. To run the language model tool:

java -jar simple_lm.jar

This will display usage information. Please read this information carefully.

simple_lm.jar implements language model creation as described in the HTK book (Section 17.14.2, real-page: 275, pdf-page: 284, the parameters in the usage infomation refer to constants described here). The input file for the language model tool should be cleaned in advance and have one utterance per line. simple_lm will use the following constants in the produced language model (ARPA-)file:

  • “!ENTER” to mark the start of sentence
  • “!EXIT” to mark the end of sentence
  • “!NULL” to mark an unknown word

If you create your own language model for use with Fine-Tracker you should take care to use these constants in your language model file.

Back to Introduction