org.jasen.core.token
Class SimpleWordTokenizer
java.lang.Object
org.jasen.core.token.SimpleWordTokenizer
- public class SimpleWordTokenizer
- extends Object
Used to parse text which has already been semi formatted.
This class is used to prepare the linguisic analysis engine
- Author:
- Jason Polites
Method Summary |
String[] |
getTokens()
Gets the tokens returned from the tokenization process |
void |
tokenize()
Tokenizes (splits) the text |
SimpleWordTokenizer
public SimpleWordTokenizer(File file)
throws FileNotFoundException
SimpleWordTokenizer
public SimpleWordTokenizer(InputStream in)
tokenize
public void tokenize()
throws IOException
- Tokenizes (splits) the text
- Throws:
IOException
getTokens
public String[] getTokens()
- Gets the tokens returned from the tokenization process
- Returns: