Package edu.berkeley.nlp.lm.map
Class HashNgramMap<T>
java.lang.Object
edu.berkeley.nlp.lm.map.AbstractNgramMap<T>
edu.berkeley.nlp.lm.map.HashNgramMap<T>
- Type Parameters:
T-
- All Implemented Interfaces:
ContextEncodedNgramMap<T>,NgramMap<T>,Serializable
- Author:
- adampauls
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.map.NgramMap
NgramMap.Entry<T> -
Field Summary
Fields inherited from class edu.berkeley.nlp.lm.map.AbstractNgramMap
NUM_BITS_PER_BYTE, NUM_SUFFIX_BITS, NUM_WORD_BITS, opts, SUFFIX_BIT_MASK, values, WORD_BIT_MASK -
Method Summary
Modifier and TypeMethodDescriptionvoidbooleancontains(int[] ngram, int startPos, int endPos) static <T> HashNgramMap<T> createExplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, int maxNgramOrder, boolean reversed) Note: Explicit HashNgramMap can grow beyond maxNgramOrderstatic <T> HashNgramMap<T> createImplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, LongArray[] numNgramsForEachWord, boolean reversed) get(int[] ngram, int startPos, int endPos) intgetFirstWordForOffset(long offset, int ngramOrder) intgetLastWordForOffset(long offset, int ngramOrder) intlonggetNextContextOffset(long offset, int ngramOrder) intgetNextWord(long offset, int ngramOrder) int[]getNgramForOffset(long offset, int ngramOrder) int[]getNgramForOffset(long offset, int ngramOrder, int[] ret) int[]getNgramFromContextEncoding(long contextOffset, int contextOrder, int word) getNgramOffsetsForOrder(int ngramOrder) getNgramsForOrder(int ngramOrder) longgetNumNgrams(int ngramOrder) longgetOffset(long contextOffset, int contextOrder, int word) getOffsetForNgram(int[] ngram, int startPos, int endPos) longgetOffsetForNgramInModel(int[] ngram, int startPos, int endPos) LikegetOffsetForNgram(int[], int, int), but assumes that the full n-gram is in the map (i.e.longgetPrefixOffset(long offset, int ngramOrder) Gets the offset of the context for an n-gram (represented by offset)longlonggetValueAndOffset(long contextOffset, int contextOrder, int word, T outputVal) getValueStoringArray(int ngramOrder) voidhandleNgramsFinished(int justFinishedOrder) voidinitWithLengths(List<Long> numNGrams) booleanlonglongputWithOffset(int[] ngram, int startPos, int endPos, long contextOffset, T val) Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly.longputWithOffsetAndSuffix(int[] ngram, int startPos, int endPos, long contextOffset, long suffixOffset, T val) Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly.voidrehashIfNecessary(int num) voidtrim()booleanwordHasBigrams(int word) Methods inherited from class edu.berkeley.nlp.lm.map.AbstractNgramMap
combineToKey, containsOutOfVocab, contextOffsetOf, equals, getSubArray, getValues, wordOf
-
Method Details
-
createImplicitWordHashNgramMap
public static <T> HashNgramMap<T> createImplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, LongArray[] numNgramsForEachWord, boolean reversed) -
createExplicitWordHashNgramMap
public static <T> HashNgramMap<T> createExplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, int maxNgramOrder, boolean reversed) Note: Explicit HashNgramMap can grow beyond maxNgramOrder- Type Parameters:
T-- Parameters:
values-opts-maxNgramOrder-reversed-- Returns:
-
put
-
putWithOffset
Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly. This is so that the offsets returned remain valid. Basically, you should not use this function unless you really know what you're doing.- Parameters:
ngram-startPos-endPos-contextOffset-val-- Returns:
-
putWithOffsetAndSuffix
public long putWithOffsetAndSuffix(int[] ngram, int startPos, int endPos, long contextOffset, long suffixOffset, T val) Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly. This is so that the offsets returned remain valid. Basically, you should not use this function unless you really know what you're doing.- Parameters:
ngram-startPos-endPos-contextOffset-val-- Returns:
-
rehashIfNecessary
public void rehashIfNecessary(int num) -
getValueAndOffset
- Specified by:
getValueAndOffsetin interfaceNgramMap<T>
-
getOffset
public long getOffset(long contextOffset, int contextOrder, int word) - Specified by:
getOffsetin interfaceContextEncodedNgramMap<T>
-
getNgramFromContextEncoding
public int[] getNgramFromContextEncoding(long contextOffset, int contextOrder, int word) - Specified by:
getNgramFromContextEncodingin interfaceContextEncodedNgramMap<T>
-
getNextWord
public int getNextWord(long offset, int ngramOrder) -
getNextContextOffset
public long getNextContextOffset(long offset, int ngramOrder) -
getFirstWordForOffset
public int getFirstWordForOffset(long offset, int ngramOrder) -
getLastWordForOffset
public int getLastWordForOffset(long offset, int ngramOrder) -
getNgramForOffset
public int[] getNgramForOffset(long offset, int ngramOrder) -
getNgramForOffset
public int[] getNgramForOffset(long offset, int ngramOrder, int[] ret) -
getOffsetForNgram
public ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram, int startPos, int endPos) - Specified by:
getOffsetForNgramin interfaceContextEncodedNgramMap<T>
-
getOffsetForNgramInModel
public long getOffsetForNgramInModel(int[] ngram, int startPos, int endPos) LikegetOffsetForNgram(int[], int, int), but assumes that the full n-gram is in the map (i.e. does not back off to the largest suffix which is in the model).- Parameters:
ngram-startPos-endPos-- Returns:
-
handleNgramsFinished
public void handleNgramsFinished(int justFinishedOrder) - Specified by:
handleNgramsFinishedin interfaceNgramMap<T>
-
initWithLengths
- Specified by:
initWithLengthsin interfaceNgramMap<T>
-
trim
public void trim() -
getPrefixOffset
public long getPrefixOffset(long offset, int ngramOrder) Gets the offset of the context for an n-gram (represented by offset)- Parameters:
offset-- Returns:
-
getMaxNgramOrder
public int getMaxNgramOrder()- Specified by:
getMaxNgramOrderin interfaceNgramMap<T>
-
getNumNgrams
public long getNumNgrams(int ngramOrder) - Specified by:
getNumNgramsin interfaceNgramMap<T>
-
getNgramsForOrder
- Specified by:
getNgramsForOrderin interfaceNgramMap<T>
-
getNgramOffsetsForOrder
-
isReversed
public boolean isReversed() -
wordHasBigrams
public boolean wordHasBigrams(int word) - Specified by:
wordHasBigramsin interfaceContextEncodedNgramMap<T>
-
contains
public boolean contains(int[] ngram, int startPos, int endPos) -
get
-
getTotalSize
public long getTotalSize() -
getValueStoringArray
- Specified by:
getValueStoringArrayin interfaceNgramMap<T>
-
clearStorage
public void clearStorage()- Specified by:
clearStoragein interfaceNgramMap<T>
-