edu.berkeley.nlp.lm
Class StupidBackoffLm<W>

java.lang.Object
  extended by edu.berkeley.nlp.lm.AbstractNgramLanguageModel<W>
      extended by edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel<W>
          extended by edu.berkeley.nlp.lm.StupidBackoffLm<W>
Type Parameters:
W -
All Implemented Interfaces:
ArrayEncodedNgramLanguageModel<W>, NgramLanguageModel<W>, Serializable

public class StupidBackoffLm<W>
extends AbstractArrayEncodedNgramLanguageModel<W>
implements ArrayEncodedNgramLanguageModel<W>, Serializable

Language model implementation which uses stupid backoff (Brants et al., 2007) computation. Note that stupid backoff does not properly normalize, so the scores this LM computes are not in fact probabilities. Also, unliked LMs estimated using LmReaders.createKneserNeyLmFromTextFiles, this model returns natural logarithms instead of log10.

Author:
adampauls
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.ArrayEncodedNgramLanguageModel
ArrayEncodedNgramLanguageModel.DefaultImplementations
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
NgramLanguageModel.StaticMethods
 
Field Summary
protected  NgramMap<LongRef> map
           
 
Fields inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
lmOrder, oovWordLogProb
 
Constructor Summary
StupidBackoffLm(int lmOrder, WordIndexer<W> wordIndexer, NgramMap<LongRef> map, ConfigOptions opts)
           
 
Method Summary
 float getLogProb(int[] ngram)
          Equivalent to getLogProb(ngram, 0, ngram.length)
 float getLogProb(int[] ngram, int startPos, int endPos)
          Calculate language model score of an n-gram.
 float getLogProb(List<W> ngram)
          Scores an n-gram.
 NgramMap<LongRef> getNgramMap()
           
 long getRawCount(int[] ngram, int startPos, int endPos)
          Gets the raw count of an n-gram.
 
Methods inherited from class edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel
scoreSentence
 
Methods inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
getLmOrder, getWordIndexer, setOovWordLogProb
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
getLmOrder, getWordIndexer, scoreSentence, setOovWordLogProb
 

Field Detail

map

protected final NgramMap<LongRef> map
Constructor Detail

StupidBackoffLm

public StupidBackoffLm(int lmOrder,
                       WordIndexer<W> wordIndexer,
                       NgramMap<LongRef> map,
                       ConfigOptions opts)
Method Detail

getLogProb

public float getLogProb(int[] ngram,
                        int startPos,
                        int endPos)
Description copied from interface: ArrayEncodedNgramLanguageModel
Calculate language model score of an n-gram. Warning: if you pass in an n-gram of length greater than getLmOrder(), this call will silently ignore the extra words of context. In other words, if you pass in a 5-gram (endPos-startPos == 5) to a 3-gram model, it will only score the words from startPos + 2 to endPos.

Specified by:
getLogProb in interface ArrayEncodedNgramLanguageModel<W>
Specified by:
getLogProb in class AbstractArrayEncodedNgramLanguageModel<W>
Parameters:
ngram - array of words in integer representation
startPos - start of the portion of the array to be read
endPos - end of the portion of the array to be read.
Returns:

getRawCount

public long getRawCount(int[] ngram,
                        int startPos,
                        int endPos)
Gets the raw count of an n-gram.

Parameters:
ngram -
startPos -
endPos -
Returns:
count of n-gram, or -1 if n-gram is not in the map.

getLogProb

public float getLogProb(int[] ngram)
Description copied from interface: ArrayEncodedNgramLanguageModel
Equivalent to getLogProb(ngram, 0, ngram.length)

Specified by:
getLogProb in interface ArrayEncodedNgramLanguageModel<W>
Overrides:
getLogProb in class AbstractArrayEncodedNgramLanguageModel<W>
See Also:
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)

getLogProb

public float getLogProb(List<W> ngram)
Description copied from interface: NgramLanguageModel
Scores an n-gram. This is a convenience method and will generally be relatively inefficient. More efficient versions are available in ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int) and ContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo) .

Specified by:
getLogProb in interface NgramLanguageModel<W>
Overrides:
getLogProb in class AbstractArrayEncodedNgramLanguageModel<W>

getNgramMap

public NgramMap<LongRef> getNgramMap()