LexicalTreeAnalyzer

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.jasen.core.linguistics
Class LexicalTreeAnalyzer

java.lang.Object
  org.jasen.core.linguistics.LexicalTreeAnalyzer

public class LexicalTreeAnalyzer
extends Object

Employes a lexical tree approach to word recognition.

Based on a sample corpus, the analyser builds a tree of characters such that each characters in a word is a node in the tree.

When a word with a similar character sequence is found, the path to the next character is strengthened

Author:: Jason Polites

Constructor Summary
`LexicalTreeAnalyzer()`

Method Summary
`double`	`computeWordValue(String word)` Computes the probability that the given sequence of characters is an English word.
`void`	`initialize()` Creates and initialized the analyzer

Methods inherited from class java.lang.Object

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

LexicalTreeAnalyzer

public LexicalTreeAnalyzer()

Method Detail

initialize

public void initialize()
                throws IOException

Creates and initialized the analyzer

Throws:: IOException

computeWordValue

public double computeWordValue(String word)

Computes the probability that the given sequence of characters is an English word.

This works on the premise that most English words exhibit a similar set of character sequence patterns in both their prefix, body and suffix.

The value of the word is determined by analysis if the characters in the word against the values in both the forward and backward lexical trees.

The maximium possible value a word can have is 1 (100%), thus for each character in the word which is correctly positioned in accordance with the rules in the tree, the computed value is increased by 1/W where 'W' is the length of the word; such that if a word perfectly matches a branch of the tree a result of 1/W x W (or 1) will be returned.

Where a word fails to match a forward branch perfectly, two things are done:

For each remaining character in the token, the current total is reduced by the same percentile fraction as used to calculate the total.
The token is given a "second chance" by repeating the initial calculation process with the reverse tree.

Parameters:: word - The word to be tested
Returns:: A value between 0.0 and 1.0 indicating the probability that the String is an English word.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.jasen.core.linguistics Class LexicalTreeAnalyzer

LexicalTreeAnalyzer

initialize

computeWordValue

org.jasen.core.linguistics
Class LexicalTreeAnalyzer