edu.berkeley.nlp.lm
Class StringWordIndexer

java.lang.Object
  extended by edu.berkeley.nlp.lm.StringWordIndexer
All Implemented Interfaces:
WordIndexer<String>, Serializable

public class StringWordIndexer
extends Object
implements WordIndexer<String>

Implementation of a WordIndexer in which words are represented as strings.

Author:
adampauls
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.WordIndexer
WordIndexer.StaticMethods
 
Constructor Summary
StringWordIndexer()
           
 
Method Summary
 String getEndSymbol()
          Returns the start symbol (usually something like </s>
 int getIndexPossiblyUnk(String word)
          Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.
 int getOrAddIndex(String word)
          Gets the index for a word, adding if necessary.
 int getOrAddIndexFromString(String word)
           
 String getStartSymbol()
          Returns the start symbol (usually something like <s>
 String getUnkSymbol()
          Returns the unk symbol (usually something like <unk>
 String getWord(int index)
          Gets the word object for an index.
 int numWords()
          Number of words that have been added so far
 void setEndSymbol(String sym)
           
 void setStartSymbol(String sym)
           
 void setUnkSymbol(String sym)
           
 void trimAndLock()
          Informs the implementation that no more words can be added to the vocabulary.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StringWordIndexer

public StringWordIndexer()
Method Detail

getOrAddIndex

public int getOrAddIndex(String word)
Description copied from interface: WordIndexer
Gets the index for a word, adding if necessary.

Specified by:
getOrAddIndex in interface WordIndexer<String>
Returns:

getWord

public String getWord(int index)
Description copied from interface: WordIndexer
Gets the word object for an index.

Specified by:
getWord in interface WordIndexer<String>
Returns:

numWords

public int numWords()
Description copied from interface: WordIndexer
Number of words that have been added so far

Specified by:
numWords in interface WordIndexer<String>
Returns:

getStartSymbol

public String getStartSymbol()
Description copied from interface: WordIndexer
Returns the start symbol (usually something like <s>

Specified by:
getStartSymbol in interface WordIndexer<String>
Returns:

getEndSymbol

public String getEndSymbol()
Description copied from interface: WordIndexer
Returns the start symbol (usually something like </s>

Specified by:
getEndSymbol in interface WordIndexer<String>
Returns:

getUnkSymbol

public String getUnkSymbol()
Description copied from interface: WordIndexer
Returns the unk symbol (usually something like <unk>

Specified by:
getUnkSymbol in interface WordIndexer<String>
Returns:

getOrAddIndexFromString

public int getOrAddIndexFromString(String word)
Specified by:
getOrAddIndexFromString in interface WordIndexer<String>

setStartSymbol

public void setStartSymbol(String sym)
Specified by:
setStartSymbol in interface WordIndexer<String>

setEndSymbol

public void setEndSymbol(String sym)
Specified by:
setEndSymbol in interface WordIndexer<String>

setUnkSymbol

public void setUnkSymbol(String sym)
Specified by:
setUnkSymbol in interface WordIndexer<String>

trimAndLock

public void trimAndLock()
Description copied from interface: WordIndexer
Informs the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.

Specified by:
trimAndLock in interface WordIndexer<String>

getIndexPossiblyUnk

public int getIndexPossiblyUnk(String word)
Description copied from interface: WordIndexer
Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.

Specified by:
getIndexPossiblyUnk in interface WordIndexer<String>
Returns: