edu.berkeley.nlp.lm
Class ArrayEncodedProbBackoffLm<W>

java.lang.Object
  extended by edu.berkeley.nlp.lm.AbstractNgramLanguageModel<W>
      extended by edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel<W>
          extended by edu.berkeley.nlp.lm.ArrayEncodedProbBackoffLm<W>
Type Parameters:
W -
All Implemented Interfaces:
ArrayEncodedNgramLanguageModel<W>, NgramLanguageModel<W>, Serializable

public class ArrayEncodedProbBackoffLm<W>
extends AbstractArrayEncodedNgramLanguageModel<W>
implements ArrayEncodedNgramLanguageModel<W>, Serializable

Language model implementation which uses Kneser-Ney-style backoff computation. Note that unlike the description in Pauls and Klein (2011), we store trie for which the first word in n-gram points to its prefix for this particular implementation. This is in contrast to ContextEncodedProbBackoffLm, which stores a trie for which the last word points to its suffix. This was done because it simplifies the code significantly, without significantly changing speed or memory usage.

Author:
adampauls
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.ArrayEncodedNgramLanguageModel
ArrayEncodedNgramLanguageModel.DefaultImplementations
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
NgramLanguageModel.StaticMethods
 
Field Summary
 
Fields inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
lmOrder, oovWordLogProb
 
Constructor Summary
ArrayEncodedProbBackoffLm(int lmOrder, WordIndexer<W> wordIndexer, NgramMap<ProbBackoffPair> map, ConfigOptions opts)
           
 
Method Summary
 float getLogProb(int[] ngram)
          Equivalent to getLogProb(ngram, 0, ngram.length)
 float getLogProb(int[] ngram, int startPos, int endPos)
          Calculate language model score of an n-gram.
 float getLogProb(List<W> ngram)
          Scores an n-gram.
 NgramMap<ProbBackoffPair> getNgramMap()
           
 
Methods inherited from class edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel
scoreSentence
 
Methods inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
getLmOrder, getWordIndexer, setOovWordLogProb
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
getLmOrder, getWordIndexer, scoreSentence, setOovWordLogProb
 

Constructor Detail

ArrayEncodedProbBackoffLm

public ArrayEncodedProbBackoffLm(int lmOrder,
                                 WordIndexer<W> wordIndexer,
                                 NgramMap<ProbBackoffPair> map,
                                 ConfigOptions opts)
Method Detail

getLogProb

public float getLogProb(int[] ngram,
                        int startPos,
                        int endPos)
Description copied from interface: ArrayEncodedNgramLanguageModel
Calculate language model score of an n-gram. Warning: if you pass in an n-gram of length greater than getLmOrder(), this call will silently ignore the extra words of context. In other words, if you pass in a 5-gram (endPos-startPos == 5) to a 3-gram model, it will only score the words from startPos + 2 to endPos.

Specified by:
getLogProb in interface ArrayEncodedNgramLanguageModel<W>
Specified by:
getLogProb in class AbstractArrayEncodedNgramLanguageModel<W>
Parameters:
ngram - array of words in integer representation
startPos - start of the portion of the array to be read
endPos - end of the portion of the array to be read.
Returns:

getLogProb

public float getLogProb(int[] ngram)
Description copied from interface: ArrayEncodedNgramLanguageModel
Equivalent to getLogProb(ngram, 0, ngram.length)

Specified by:
getLogProb in interface ArrayEncodedNgramLanguageModel<W>
Overrides:
getLogProb in class AbstractArrayEncodedNgramLanguageModel<W>
See Also:
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)

getLogProb

public float getLogProb(List<W> ngram)
Description copied from interface: NgramLanguageModel
Scores an n-gram. This is a convenience method and will generally be relatively inefficient. More efficient versions are available in ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int) and ContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo) .

Specified by:
getLogProb in interface NgramLanguageModel<W>
Overrides:
getLogProb in class AbstractArrayEncodedNgramLanguageModel<W>

getNgramMap

public NgramMap<ProbBackoffPair> getNgramMap()