edu.berkeley.nlp.lm.io
Class MakeLmBinaryFromGoogle

java.lang.Object
  extended by edu.berkeley.nlp.lm.io.MakeLmBinaryFromGoogle

public class MakeLmBinaryFromGoogle
extends Object

Given a directory in Google n-grams format, builds a binary representation of a stupid-backoff language model language model and writes it to disk. Language model binaries are significantly smaller and faster to load. Note: actually running this code on the full Google-ngrams corpus can be very slow and memory intensive -- on our machines, it takes about 32GB of memory and 15 hours.

Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary.

Author:
adampauls

Constructor Summary
MakeLmBinaryFromGoogle()
           
 
Method Summary
static void main(String[] argv)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MakeLmBinaryFromGoogle

public MakeLmBinaryFromGoogle()
Method Detail

main

public static void main(String[] argv)