org.finetracker.simple_lm
Class NGram

java.lang.Object
  extended by org.finetracker.simple_lm.NGram

public class NGram
extends java.lang.Object

Representation of a N-tuple of words. Instances of this class are immutable, and sharing is prevented by copying of the arrays. Furthermore it facilitates the following:

Author:
Albert Gerritsen

Constructor Summary
NGram(int N, java.lang.String[] words)
          Constructs an N-Gram.
NGram(java.lang.String word)
          Simple constructor to quickly create an unigram.
NGram(java.lang.String prevWord, java.lang.String word)
          Simple constructor to quickly create an bigram.
 
Method Summary
 java.lang.String createString(java.lang.String seperator)
          Creates a string with all elements of this NGram separated by some separator.
 boolean equals(java.lang.Object o)
           
 java.lang.String getElement(int idx)
          Gets a particular element from this NGram.
 int getOrder()
          Gets the order of this NGram.
 java.lang.String getPostfix()
          Returns the last word in this tuple.
 int hashCode()
          Hashes the tuples based on the actual words (so this is a deep hash)
 boolean isPrefix(NGram ngram)
          Checks whether the passed NGram is a prefix of this NGram.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NGram

public NGram(int N,
             java.lang.String[] words)
Constructs an N-Gram.

Parameters:
N - The order of this N-Gram
words - The words that make up this N-Gram

NGram

public NGram(java.lang.String word)
Simple constructor to quickly create an unigram. So this is shorthand for: NGram(1, new String[] {word})

Parameters:
word - The word that makes up this Unigram

NGram

public NGram(java.lang.String prevWord,
             java.lang.String word)
Simple constructor to quickly create an bigram. So this is shorthand for: NGram(2, new String[] {prevWord, word})

Parameters:
prevWord - The first word in this bigram
word - The second word in this bigram
Method Detail

isPrefix

public boolean isPrefix(NGram ngram)
Checks whether the passed NGram is a prefix of this NGram. This means all words in the tuple except the last one should be equal.

Parameters:
ngram - The prefix to check for
Returns:
true if ngram is a valid prefix, false otherwise

getPostfix

public java.lang.String getPostfix()
Returns the last word in this tuple.

Returns:
The last word

getElement

public java.lang.String getElement(int idx)
Gets a particular element from this NGram.

Parameters:
idx - The index of the element
Returns:
The idx-th element

getOrder

public int getOrder()
Gets the order of this NGram.

Returns:
The order of this NGram.

createString

public java.lang.String createString(java.lang.String seperator)
Creates a string with all elements of this NGram separated by some separator.

Parameters:
seperator - The separator that should be used to construct this String
Returns:
The string containing all elements of this tuple

hashCode

public int hashCode()
Hashes the tuples based on the actual words (so this is a deep hash)

Overrides:
hashCode in class java.lang.Object

equals

public boolean equals(java.lang.Object o)
Overrides:
equals in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object