|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
W
- A type representing words in the language. Can be a
String
, or something more complex if neededpublic interface WordIndexer<W>
Enumerates words in the vocabulary of a language model. Stores a two-way mapping between integers and words.
Nested Class Summary | |
---|---|
static class |
WordIndexer.StaticMethods
|
Method Summary | |
---|---|
W |
getEndSymbol()
Returns the start symbol (usually something like </s> |
int |
getIndexPossiblyUnk(W word)
Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary. |
int |
getOrAddIndex(W word)
Gets the index for a word, adding if necessary. |
int |
getOrAddIndexFromString(String word)
|
W |
getStartSymbol()
Returns the start symbol (usually something like <s> |
W |
getUnkSymbol()
Returns the unk symbol (usually something like <unk> |
W |
getWord(int index)
Gets the word object for an index. |
int |
numWords()
Number of words that have been added so far |
void |
setEndSymbol(W sym)
|
void |
setStartSymbol(W sym)
|
void |
setUnkSymbol(W sym)
|
void |
trimAndLock()
Informs the implementation that no more words can be added to the vocabulary. |
Method Detail |
---|
int getOrAddIndex(W word)
word
-
int getOrAddIndexFromString(String word)
int getIndexPossiblyUnk(W word)
word
-
W getWord(int index)
index
-
int numWords()
W getStartSymbol()
void setStartSymbol(W sym)
W getEndSymbol()
void setEndSymbol(W sym)
W getUnkSymbol()
void setUnkSymbol(W sym)
void trimAndLock()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |