edu.berkeley.nlp.lm
Interface ContextEncodedNgramLanguageModel<W>

Type Parameters:
W -
All Superinterfaces:
NgramLanguageModel<W>
All Known Implementing Classes:
AbstractContextEncodedNgramLanguageModel, ContextEncodedCachingLmWrapper, ContextEncodedProbBackoffLm

public interface ContextEncodedNgramLanguageModel<W>
extends NgramLanguageModel<W>

Interface for language models which expose the internal context-encoding for more efficient queries. (Note: language model implementations may internally use a context-encoding without implementing this interface). A context-encoding encodes an n-gram as a integer representing the last word, and an offset which serves as a logical pointer to the (n-1) prefix words. The integers represent words of type W in the vocabulary, and the mapping from the vocabulary to integers is managed by an instance of the WordIndexer class.

Author:
adampauls

Nested Class Summary
static class ContextEncodedNgramLanguageModel.DefaultImplementations
           
static class ContextEncodedNgramLanguageModel.LmContextInfo
          Simple class for returning context offsets
 
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
NgramLanguageModel.StaticMethods
 
Method Summary
 float getLogProb(long contextOffset, int contextOrder, int word, ContextEncodedNgramLanguageModel.LmContextInfo outputContext)
          Get the score for an n-gram, and also get the context offset of the n-gram's suffix.
 int[] getNgramForOffset(long contextOffset, int contextOrder, int word)
          Gets the n-gram referred to by a context-encoding.
 ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram, int startPos, int endPos)
          Gets the offset which refers to an n-gram.
 
Methods inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
getLmOrder, getLogProb, getWordIndexer, scoreSentence, setOovWordLogProb
 

Method Detail

getLogProb

float getLogProb(long contextOffset,
                 int contextOrder,
                 int word,
                 ContextEncodedNgramLanguageModel.LmContextInfo outputContext)
Get the score for an n-gram, and also get the context offset of the n-gram's suffix.

Parameters:
contextOffset - Offset of context (prefix) of an n-gram
contextOrder - The (0-based) length of context (i.e. order == 0 iff context refers to a unigram).
word - Last word of the n-gram
outputContext - Offset of the suffix of the input n-gram. If the parameter is null it will be ignored. This can be passed to future queries for efficient access.
Returns:

getOffsetForNgram

ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram,
                                                                 int startPos,
                                                                 int endPos)
Gets the offset which refers to an n-gram. If the n-gram is not in the model, then it returns the shortest suffix of the n-gram which is. This operation is not necessarily fast.


getNgramForOffset

int[] getNgramForOffset(long contextOffset,
                        int contextOrder,
                        int word)
Gets the n-gram referred to by a context-encoding. This operation is not necessarily fast.