edu.berkeley.nlp.lm
Class AbstractContextEncodedNgramLanguageModel<W>

java.lang.Object
  extended by edu.berkeley.nlp.lm.AbstractNgramLanguageModel<W>
      extended by edu.berkeley.nlp.lm.AbstractContextEncodedNgramLanguageModel<W>
Type Parameters:
W -
All Implemented Interfaces:
ContextEncodedNgramLanguageModel<W>, NgramLanguageModel<W>, Serializable
Direct Known Subclasses:
ContextEncodedCachingLmWrapper, ContextEncodedProbBackoffLm

public abstract class AbstractContextEncodedNgramLanguageModel<W>
extends AbstractNgramLanguageModel<W>
implements ContextEncodedNgramLanguageModel<W>, Serializable

Default implementation of all ContextEncodedNgramLanguageModel functionality except ContextEncodedNgramLanguageModel.getLogProb(long, int, int, LmContextInfo), {@link #getOffsetForNgram(int[], int, int), and {

Author:
adampauls
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel
ContextEncodedNgramLanguageModel.DefaultImplementations, ContextEncodedNgramLanguageModel.LmContextInfo
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
NgramLanguageModel.StaticMethods
 
Field Summary
 
Fields inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
lmOrder, oovWordLogProb
 
Constructor Summary
AbstractContextEncodedNgramLanguageModel(int lmOrder, WordIndexer<W> wordIndexer, float oovWordLogProb)
           
 
Method Summary
 float getLogProb(List<W> phrase)
          Scores an n-gram.
abstract  float getLogProb(long contextOffset, int contextOrder, int word, ContextEncodedNgramLanguageModel.LmContextInfo outputContext)
          Get the score for an n-gram, and also get the context offset of the n-gram's suffix.
abstract  int[] getNgramForOffset(long contextOffset, int contextOrder, int word)
          Gets the n-gram referred to by a context-encoding.
abstract  ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram, int startPos, int endPos)
          Gets the offset which refers to an n-gram.
 float scoreSentence(List<W> sentence)
          Scores a complete sentence, taking appropriate care with the start- and end-of-sentence symbols.
 
Methods inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
getLmOrder, getWordIndexer, setOovWordLogProb
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
getLmOrder, getWordIndexer, setOovWordLogProb
 

Constructor Detail

AbstractContextEncodedNgramLanguageModel

public AbstractContextEncodedNgramLanguageModel(int lmOrder,
                                                WordIndexer<W> wordIndexer,
                                                float oovWordLogProb)
Method Detail

scoreSentence

public float scoreSentence(List<W> sentence)
Description copied from interface: NgramLanguageModel
Scores a complete sentence, taking appropriate care with the start- and end-of-sentence symbols. This is a convenience method and will generally be inefficient.

Specified by:
scoreSentence in interface NgramLanguageModel<W>
Returns:

getLogProb

public float getLogProb(List<W> phrase)
Description copied from interface: NgramLanguageModel
Scores an n-gram. This is a convenience method and will generally be relatively inefficient. More efficient versions are available in ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int) and ContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo) .

Specified by:
getLogProb in interface NgramLanguageModel<W>

getLogProb

public abstract float getLogProb(long contextOffset,
                                 int contextOrder,
                                 int word,
                                 ContextEncodedNgramLanguageModel.LmContextInfo outputContext)
Description copied from interface: ContextEncodedNgramLanguageModel
Get the score for an n-gram, and also get the context offset of the n-gram's suffix.

Specified by:
getLogProb in interface ContextEncodedNgramLanguageModel<W>
Parameters:
contextOffset - Offset of context (prefix) of an n-gram
contextOrder - The (0-based) length of context (i.e. order == 0 iff context refers to a unigram).
word - Last word of the n-gram
outputContext - Offset of the suffix of the input n-gram. If the parameter is null it will be ignored. This can be passed to future queries for efficient access.
Returns:

getOffsetForNgram

public abstract ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram,
                                                                                 int startPos,
                                                                                 int endPos)
Description copied from interface: ContextEncodedNgramLanguageModel
Gets the offset which refers to an n-gram. If the n-gram is not in the model, then it returns the shortest suffix of the n-gram which is. This operation is not necessarily fast.

Specified by:
getOffsetForNgram in interface ContextEncodedNgramLanguageModel<W>

getNgramForOffset

public abstract int[] getNgramForOffset(long contextOffset,
                                        int contextOrder,
                                        int word)
Description copied from interface: ContextEncodedNgramLanguageModel
Gets the n-gram referred to by a context-encoding. This operation is not necessarily fast.

Specified by:
getNgramForOffset in interface ContextEncodedNgramLanguageModel<W>