edu.berkeley.nlp.lm
Class ArrayEncodedProbBackoffLm<W>
java.lang.Object
edu.berkeley.nlp.lm.AbstractNgramLanguageModel<W>
edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel<W>
edu.berkeley.nlp.lm.ArrayEncodedProbBackoffLm<W>
- Type Parameters:
W
-
- All Implemented Interfaces:
- ArrayEncodedNgramLanguageModel<W>, NgramLanguageModel<W>, Serializable
public class ArrayEncodedProbBackoffLm<W>
- extends AbstractArrayEncodedNgramLanguageModel<W>
- implements ArrayEncodedNgramLanguageModel<W>, Serializable
Language model implementation which uses Kneser-Ney-style backoff
computation.
Note that unlike the description in Pauls and Klein (2011), we store trie for
which the first word in n-gram points to its prefix for this particular
implementation. This is in contrast to ContextEncodedProbBackoffLm
,
which stores a trie for which the last word points to its suffix. This was
done because it simplifies the code significantly, without significantly
changing speed or memory usage.
- Author:
- adampauls
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ArrayEncodedProbBackoffLm
public ArrayEncodedProbBackoffLm(int lmOrder,
WordIndexer<W> wordIndexer,
NgramMap<ProbBackoffPair> map,
ConfigOptions opts)
getLogProb
public float getLogProb(int[] ngram,
int startPos,
int endPos)
- Description copied from interface:
ArrayEncodedNgramLanguageModel
- Calculate language model score of an n-gram. Warning: if you
pass in an n-gram of length greater than
getLmOrder()
,
this call will silently ignore the extra words of context. In other
words, if you pass in a 5-gram (endPos-startPos == 5
) to
a 3-gram model, it will only score the words from startPos + 2
to endPos
.
- Specified by:
getLogProb
in interface ArrayEncodedNgramLanguageModel<W>
- Specified by:
getLogProb
in class AbstractArrayEncodedNgramLanguageModel<W>
- Parameters:
ngram
- array of words in integer representationstartPos
- start of the portion of the array to be readendPos
- end of the portion of the array to be read.
- Returns:
getLogProb
public float getLogProb(int[] ngram)
- Description copied from interface:
ArrayEncodedNgramLanguageModel
- Equivalent to
getLogProb(ngram, 0, ngram.length)
- Specified by:
getLogProb
in interface ArrayEncodedNgramLanguageModel<W>
- Overrides:
getLogProb
in class AbstractArrayEncodedNgramLanguageModel<W>
- See Also:
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
getLogProb
public float getLogProb(List<W> ngram)
- Description copied from interface:
NgramLanguageModel
- Scores an n-gram. This is a convenience method and will generally be
relatively inefficient. More efficient versions are available in
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
and
ContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo)
.
- Specified by:
getLogProb
in interface NgramLanguageModel<W>
- Overrides:
getLogProb
in class AbstractArrayEncodedNgramLanguageModel<W>
getNgramMap
public NgramMap<ProbBackoffPair> getNgramMap()