Package edu.berkeley.nlp.lm
Class StringWordIndexer
java.lang.Object
edu.berkeley.nlp.lm.StringWordIndexer
- All Implemented Interfaces:
WordIndexer<String>,Serializable
Implementation of a WordIndexer in which words are represented as strings.
- Author:
- adampauls
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.WordIndexer
WordIndexer.StaticMethods -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the start symbol (usually something like </s>intgetIndexPossiblyUnk(String word) Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.intgetOrAddIndex(String word) Gets the index for a word, adding if necessary.intReturns the start symbol (usually something like <s>Returns the unk symbol (usually something like <unk>getWord(int index) Gets the word object for an index.intnumWords()Number of words that have been added so farvoidsetEndSymbol(String sym) voidsetStartSymbol(String sym) voidsetUnkSymbol(String sym) voidInforms the implementation that no more words can be added to the vocabulary.
-
Constructor Details
-
StringWordIndexer
public StringWordIndexer()
-
-
Method Details
-
getOrAddIndex
Description copied from interface:WordIndexerGets the index for a word, adding if necessary.- Specified by:
getOrAddIndexin interfaceWordIndexer<String>- Parameters:
word-- Returns:
-
getWord
Description copied from interface:WordIndexerGets the word object for an index.- Specified by:
getWordin interfaceWordIndexer<String>- Parameters:
index-- Returns:
-
numWords
public int numWords()Description copied from interface:WordIndexerNumber of words that have been added so far- Specified by:
numWordsin interfaceWordIndexer<String>- Returns:
-
getStartSymbol
Description copied from interface:WordIndexerReturns the start symbol (usually something like <s>- Specified by:
getStartSymbolin interfaceWordIndexer<String>- Returns:
-
getEndSymbol
Description copied from interface:WordIndexerReturns the start symbol (usually something like </s>- Specified by:
getEndSymbolin interfaceWordIndexer<String>- Returns:
-
getUnkSymbol
Description copied from interface:WordIndexerReturns the unk symbol (usually something like <unk>- Specified by:
getUnkSymbolin interfaceWordIndexer<String>- Returns:
-
getOrAddIndexFromString
- Specified by:
getOrAddIndexFromStringin interfaceWordIndexer<String>
-
setStartSymbol
- Specified by:
setStartSymbolin interfaceWordIndexer<String>
-
setEndSymbol
- Specified by:
setEndSymbolin interfaceWordIndexer<String>
-
setUnkSymbol
- Specified by:
setUnkSymbolin interfaceWordIndexer<String>
-
trimAndLock
public void trimAndLock()Description copied from interface:WordIndexerInforms the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.- Specified by:
trimAndLockin interfaceWordIndexer<String>
-
getIndexPossiblyUnk
Description copied from interface:WordIndexerShould never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.- Specified by:
getIndexPossiblyUnkin interfaceWordIndexer<String>- Parameters:
word-- Returns:
-