org.finetracker.simple_lm
Class CorpusIterator

java.lang.Object
  extended by org.finetracker.simple_lm.CorpusIterator

public class CorpusIterator
extends java.lang.Object

For easy extraction of words and sentences from a corpus. It can be used in a iterator-like way to walk through all words in a corpus:

 for (CorpusIterator it = new CorpusIterator(r); it.eof() == false; it.getNextSentence()) 
 {  
     System.out.println("Start of a sentence");
     while(it.hasNextWord())
               System.out.println(it.getNextWord());
           System.out.println("End of a sentence");
 }
 

Author:
Albert Gerritsen

Constructor Summary
CorpusIterator(java.io.Reader r)
          Constructs a CorpusIterator based on the passed reader.
 
Method Summary
 boolean eof()
          Checks if we have encountered the end of the file when we last tried to fetch a new line
 void getNextSentence()
          Will prepare this CorpusIterator to read the next sentence.
 java.lang.String getNextWord()
          Gets the next word on the current line
 boolean hasNextWord()
          Checks if there is a word left to be read
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CorpusIterator

public CorpusIterator(java.io.Reader r)
Constructs a CorpusIterator based on the passed reader. It is automatically ready to read the first sentence.

Parameters:
r - The Reader from where to read the corpus
Method Detail

getNextSentence

public void getNextSentence()
Will prepare this CorpusIterator to read the next sentence. Empty sentences will be skipped. After this the user should check if we have not encountered the end of the file.


getNextWord

public java.lang.String getNextWord()
Gets the next word on the current line

Returns:
the next word

hasNextWord

public boolean hasNextWord()
Checks if there is a word left to be read

Returns:
true if there is such a word, false otherwise

eof

public boolean eof()
Checks if we have encountered the end of the file when we last tried to fetch a new line

Returns:
true if we are at the end of the file, false otherwise