edu.berkeley.nlp.lm
Class StupidBackoffLm<W>
java.lang.Object
edu.berkeley.nlp.lm.AbstractNgramLanguageModel<W>
edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel<W>
edu.berkeley.nlp.lm.StupidBackoffLm<W>
- Type Parameters:
W
-
- All Implemented Interfaces:
- ArrayEncodedNgramLanguageModel<W>, NgramLanguageModel<W>, Serializable
public class StupidBackoffLm<W>
- extends AbstractArrayEncodedNgramLanguageModel<W>
- implements ArrayEncodedNgramLanguageModel<W>, Serializable
Language model implementation which uses stupid backoff (Brants et al., 2007)
computation. Note that stupid backoff does not properly normalize, so the
scores this LM computes are not in fact probabilities. Also, unliked LMs estimated
using LmReaders.createKneserNeyLmFromTextFiles
, this model returns natural
logarithms instead of log10.
- Author:
- adampauls
- See Also:
- Serialized Form
Method Summary |
float |
getLogProb(int[] ngram)
Equivalent to getLogProb(ngram, 0, ngram.length) |
float |
getLogProb(int[] ngram,
int startPos,
int endPos)
Calculate language model score of an n-gram. |
float |
getLogProb(List<W> ngram)
Scores an n-gram. |
NgramMap<LongRef> |
getNgramMap()
|
long |
getRawCount(int[] ngram,
int startPos,
int endPos)
Gets the raw count of an n-gram. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
map
protected final NgramMap<LongRef> map
StupidBackoffLm
public StupidBackoffLm(int lmOrder,
WordIndexer<W> wordIndexer,
NgramMap<LongRef> map,
ConfigOptions opts)
getLogProb
public float getLogProb(int[] ngram,
int startPos,
int endPos)
- Description copied from interface:
ArrayEncodedNgramLanguageModel
- Calculate language model score of an n-gram. Warning: if you
pass in an n-gram of length greater than
getLmOrder()
,
this call will silently ignore the extra words of context. In other
words, if you pass in a 5-gram (endPos-startPos == 5
) to
a 3-gram model, it will only score the words from startPos + 2
to endPos
.
- Specified by:
getLogProb
in interface ArrayEncodedNgramLanguageModel<W>
- Specified by:
getLogProb
in class AbstractArrayEncodedNgramLanguageModel<W>
- Parameters:
ngram
- array of words in integer representationstartPos
- start of the portion of the array to be readendPos
- end of the portion of the array to be read.
- Returns:
getRawCount
public long getRawCount(int[] ngram,
int startPos,
int endPos)
- Gets the raw count of an n-gram.
- Parameters:
ngram
- startPos
- endPos
-
- Returns:
- count of n-gram, or -1 if n-gram is not in the map.
getLogProb
public float getLogProb(int[] ngram)
- Description copied from interface:
ArrayEncodedNgramLanguageModel
- Equivalent to
getLogProb(ngram, 0, ngram.length)
- Specified by:
getLogProb
in interface ArrayEncodedNgramLanguageModel<W>
- Overrides:
getLogProb
in class AbstractArrayEncodedNgramLanguageModel<W>
- See Also:
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
getLogProb
public float getLogProb(List<W> ngram)
- Description copied from interface:
NgramLanguageModel
- Scores an n-gram. This is a convenience method and will generally be
relatively inefficient. More efficient versions are available in
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
and
ContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo)
.
- Specified by:
getLogProb
in interface NgramLanguageModel<W>
- Overrides:
getLogProb
in class AbstractArrayEncodedNgramLanguageModel<W>
getNgramMap
public NgramMap<LongRef> getNgramMap()