java Anti Spam ENgine SourceForge.net Logo
The pure java Anti Spam ENgine
Overview
Features
Getting Started
Download
User Guide
FAQ
Forums SourceForge Link
Javadoc New Window
License
Commercial Use
Project Home SourceForge Link
 
java.net member!

Configuration

Almost all aspects of jASEN can be configured. Configurations can be classified into four categories:
  1. Engine configuration
  2. Parser configuration
  3. Auto-Update configuration
  4. Plugin configuration

Engine configuration

The core jASEN engine uses a jasen-config XML file to hold the primary configuration for the engine.

The default configuration file is located in jasen-conf/default-jasen-config.xml under the root path of the distributable

This configuration file is passed to the engine during initialization and has the following structure:

<jasen-config>

	<scanner .../> (1)

	<engine ...> (1)
	
	<parser ...> (1)
	
	<auto-update ...> (1)

	<plugin .../> (1...n)

</jasen-config>

<scanner>

The scanner element has the following attributes (FQCN refers to Fully Qualified Class Name):

Attribute Value Description Sample
calculator FQCN The calculator used to compute the final probability based on the results of all plugins. Must be an instance of org.jasen.interfaces.ProbabilityCalculator org.jasen.core.calculators.CompoundCalculator
mimeParser FQCN The class used to parse a MIME message and generate a JasenMessage. Must be an instance of org.jasen.interfaces.MimeMessageParser org.jasen.core.parsers.StandardMimeMessageParser
headerParser FQCN The class used to parse Received headers and extract sender information. Must be an instance of org.jasen.interfaces.ReceivedHeaderParser org.jasen.core.parsers.GenericReceivedHeaderParser
tokenizer FQCN The class used to reduce the content of an email to simple word tokens. Must be an instance of org.jasen.interfaces.MimeMessageTokenizer org.jasen.core.token.EmailTokenizer
result FQCN The class used to hold the results of a scan. Must be an instance of org.jasen.interfaces.JasenScanResult org.jasen.core.StandardScanResult
dnsResolver FQCN The class used to resolve forward and reverse DNS lookups. Must be an instance of org.jasen.interfaces.DNSResolver org.jasen.net.JasenDNSResolver
inetResolver FQCN The class used to resolve hostnames and IPAddresses. Must be an instance of org.jasen.interfaces.InetAddressResolver org.jasen.net.JasenInetAddressResolver
tokenLimit int The limit applied when tokenizing an email. The tokenizer will return the first tokenLimit tokens from an email 30
errorHandler FQCN The class used to handle unresolvable system errors org.jasen.error.SystemErrorHandler
boundary float The boundary for all scan results. Scans which return a result of < boundary or > (1-boundary) will be normalized to the relevant boundary value. E.g. if boundary is 0.01, a result of 0.9999 will be normalized to 0.99. Similarly a result of 0.00001 will be normalized to 0.01 0.01


<engine>

The engine element has the following attributes

IT IS STRONGLY RECOMMENDED THAT THESE CONFIGURATIONS ARE NOT CHANGED

Attribute Value Description Default
confidence float The strength given to background information in a chi square context 0.9
guess float In the absence of definitive information about a token, this value is used 0.5
esf float Effective Size Factor. An computational variable used internally by the engine 0.5
ftt float Few Token Threshold. The engine uses two different computations depending on the number of tokens returned by the tokenizer 25


<parser>

The parser element has the following attributes

Attribute Value Description Default
contrastThreshold float The threshold for contrast between foreground text and background color below which text content is deemed to be "concealed". This is designed to record instances of HTML noise words which have been deliberately obscured so that they are not apparent to the human eye, but would typically be discovered by software scanners and as a result may affect probability results. The simplest way to think of this configuration is a scale from 0.0 to 1.0 where 0.0 is white-on-white, and 1.0 is black-on-white. As the contrast approaches 0.0, the text becomes more difficult to read with the human eye. 0.2
microElementSize int The size in pixels of any HTML element below which the element is deemed to be a deliberate attempt to conceal content. 5
microFontSize int The size (in pixels or points) of any HTML text below which the text is deemed to be concealed. 1


<auto-update>

The auto-update element has the following attributes

Attribute Value Description Default
url String The location in which to look for update parcels http://jasen.sourceforge.net/updates/
parcel String The meta parcel containing further information about the update jasen-update.xml
frequency int The time in minutes between each update check 15
enabled Boolean If true, the auto update thread is started. false
checkOnStartup Boolean If true, an update check is run when the engine is started. If false, the check is run frequency minutes after start-up false
readBuffer int The size in bytes of the buffer to use when reading update information 1024
readTimeout int The timeout in milliseconds to wait for update information from the update server before aborting the update 5000
errorHandler FQCN The class used to handle unresolvable system errors org.jasen.error.SystemErrorHandler


<plugin>

The engine may have one or more plugins (FQCN refers to Fully Qualified Class Name):

Attribute Value Description Sample
name String The name given to the plugin RobinsonScanner
priority int The order in which this plugin is executed 1
type FQCN The plugin class. Must be an instance of org.jasen.interfaces.JasenPlugin 0.5
properties String The path to the properties (config) file for the plugin. If relative, the root path must be in the classpath default/RobinsonScanner.properties
displayName String The user friendly name to associate with the plugin AI Scanner
description String A user friendly description of the core purpose of the plugin Uses probability based intelligence together with a database of spam characteristics to analyze and detect spam

Plugin configuration

Each plugin may (optionally) specify a configuration file to be loaded when the plugin is initialized. The native plugins provided each have their own configuration:

AnomalousCharacterScanner

Looks for strange characters which when found in excess often indicate spam

Located in: jasen-conf/default/AnomalousCharacterScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.9
thresholds comma separated ints The thresholds for the characters defined in "chars" (below) above which the max probability for that character is used 10,50
chars Comma separated Strings The list of characters which often indicate spam when in excess |,!
calculator FQCN The fully qualified class name for the calculator to use when computing the final probability.
MUST be an instance of org.jasen.interfaces.ProbabilityCalculator
org.jasen.core.calculators.CompoundCalculator

AttachmentScanner

Looks for attachments with unusual or dangerous file extensions

Located in: jasen-conf/default/AttachmentScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.9
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
1
extensions Comma separated Strings The list of illegal file extensions
MUST be a comma separated sequence without spaces
...

FromAddressValidationScanner

Checks the From header against the envelope sender and/or the return path

Located in: jasen-conf/default/SenderAddressValidationScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.85
medium float The default probability given to any scan 0.5

HeuristicScanner

Performs a heuristic scan of the message contents looking for defined spam patterns

Located in: jasen-conf/default/HeuristicScanner.properties

Attribute Value Description Default
analyzer-class FQCN The class used to perform the analysis. Must be an instance of org.jasen.interfaces.HeuristicAnalyzer org.jasen.core.parsers.StandardHeuristicAnalyzer
def-class FQCN The class used to load the heuristic definitions. Must be an instance of org.jasen.interfaces.HeuristicDefinitionSet org.jasen.core.parsers.StandardHeuristicDefinitionSet
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.8
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
20

HTMLConcealmentScanner

Detects the presence of HTML concealment. This plugin actually just accesses data determined by the HTMLParser used by the engine

Located in: jasen-conf/default/HTMLConcealmentScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.85
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
5

ImageDominanceScanner

Looks for unusual ratios of images to text. This is intended to catch spam emails which use solely images to convey their message

Located in: jasen-conf/default/ImageDominanceScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.8
ratio float The ratio of images to words which implies high-prob
For example:
If ratio is set to 0.5, then high-prob is returned if there is 1 image to every 2 words
0.5

InvisiMailScanner

Simply looks for messages without content or subject

Located in: jasen-conf/default/InvisiMailScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.9

KeywordScanner

Performs a simple keyword match against the tokens returned by the tokenizer

Located in: jasen-conf/default/KeywordScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.49
max float The maximum probability given to any scan 0.9
keywords Comma separated Strings The list of illegal keywords
MUST be a comma separated sequence without spaces
...

ObfuscatedCharacterScanner

Looks for characters which have been misused to represent legitimate characters, such as '@' in place of 'A'

Located in: jasen-conf/default/ObfuscatedCharacterScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.9
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
5

RBLScanner

The Realtime Blackhole List plugin. Performs a DNS lookup against an RBL server based on the sending mail server

Located in: jasen-conf/default/RBLScanner.properties

Attribute Value Description Default
med-prob float The probability returned when no information could be obtained 0.5
max-rbls int The maximum number of RBL servers to be used 20
rbl.n String The nth RBL server. For example:

rbl.1=sbl-xbl.spamhaus.org
rbl.2=relays.ordb.org

There may be many RBL servers defined such that 0 > n <= max-rbls
N/A
OPEN_RELAY float The value given to the result if the specified server is identified as an open relay 0.7
DIALUP_SPAM float The value given to the result if the specified server is identified as a dialup spam source 0.7
SPAM_SOURCE float The value given to the result if the specified server is identified as confirmed spam source 0.9
SMART_HOST float The value given to the result if the specified server is identified as smart host 0.5
SPAM_WARE float The value given to the result if the specified server is identified as spamware source 0.9
LIST_SERVER float The value given to the result if the specified server is identified as list server 0.9
FORM_MAIL float The value given to the result if the specified server is identified as host of an open web mail form 0.9
OPEN_PROXY float The value given to the result if the specified server is identified as an open mail proxy 0.9
UNKNOWN float The value given to the result if the specified server could not be identified (no response found or timeout) 0.5

RecipientScanner

Looks for excessive numbers of recipients

Located in: jasen-conf/default/RecipientScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.8
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
20

RobinsonScanner

The main chi square probability scanner based on the ideas of Gary Robinson

Located in: jasen-conf/default/RobinsonScanner.properties

Attribute Value Description Default
map-path String The classpath relative path to the token map to be used default/jasen.dat
map-store-class FQCN The class used to load the token map. Must be an implementation of org.jasen.core.engine.JasenMapStore org.jasen.core.engine.DiskMapStore
min-tokens int The minimum number of tokens required to perform an evaluation 5
default-prob float The value given to the result if a scan could not complete 0.5

SenderAddressValidationScanner

Validates the forged status of the sender.

A sender is deemed to have been forged if and only if:

The sending MTA publicized a name (HELO) which either does not exist in the DNS, or does not match the recorded IP address for that sending MTA

Located in: jasen-conf/default/SenderAddressValidationScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.85
medium float The default probability given to any scan 0.5

TagFalseAnchorScanner

Looks for anchor tags who's href attribute does not match the displayable text inside the anchor. This is often found in phishing emails.

Located in: jasen-conf/default/TagFalseAnchorScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.49
max float The maximum probability given to any scan 0.9
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
1

TagSourceCgiScanner

Looks for mail bugs. These are images which reference a CGI script (or other server side script) to render an image

Located in: jasen-conf/default/TagSourceCgiScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.8
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
2

TagSourcePortScanner

Looks for unusual port references in urls. This is intended to identify suspicious emails which contain references to unusual or vulnerable tcp ports

Located in: jasen-conf/default/TagSourcePortScanner.properties

Attribute Value Description Default
min float The minimum probability given to any scan 0.5
max float The maximum probability given to any scan 0.8
threshold int The number of occurrences required to return max such that the value returned is expressed as:
probability = num_occurrances x ((max - min) / threshold)
Where
	num_occurrances <= threshold
Such that
	if(num_occurrances >= threshold) {
		probability = max;
	}
2
ports Comma separated ints Allowed ports
These are ports which are considered innocuous and are hence allowed.
MUST be a comma separated sequence without spaces
...