How does jASEN work?
jASEN combines probability-based analysis with intelligent message tokenization coupled with an extensive
database of known spam heuristics to effectively identify spam whilst minimizing the occurrence of
In addition to this, jASEN provides a mechanism to incorporate custom and or 3rd party plugins which can be
engineered to perform almost any additional filtering techniques required including sender verification
systems like SPF and SenderID.
Why does jASEN sometimes fail to detect spam?
This may be caused by many things. jASEN looks for certain patterns commonly found in spam email, but also
looks for patterns commonly found in non-spam email so as to avoid false positives. If a spam email does not
contain any major spam "markers" then it may be classified as ham. Fortunately this does not very often
Even the most innocuous spam email will usually contain some spammy words or markers. In these cases, jASEN
may identify that the email is not completely clean, but may not have enough evidence to make a definitive claim either
So... how do you improve the situation? There are three main ways to make jASEN more accurate for your situation:
jASEN is distributed with a reference engine configuration based on a training corpus of approximately
10,000 emails (spam and ham). Whilst this configuration works well, the engine is designed to be regularly updated
and "re-trained" to maximize its detection capabilities. As new types of spam emerge, the engine must
be updated to recognize these new varieties. See the Training section for more information.
Almost all the features of jASEN used to detect spam are completely configurable. This means you have fine grained
control over how jASEN ranks and scores email messages. Be aware however, that it is also possible
to render jASEN useless if nonsensical configurations are set. See the Tuning section for more information.
If the native features of jASEN still won't pick up the offending message(s) then you can always create your own plugin.
This may be as simply as manually checking for certain spam markers not identified by jASEN (special keywords, lookup against
your own database etc), or may involve more complex features like incorporating AI scanning systems etc. The choice is
up to you. Please note however, that the licence agreement under which jASEN is distributed requires that any work done to
the engine itself must be fed back into the project.
How do I install jASEN into my mail server?
jASEN is currently an engine only and does not provide any out-of-the-box integration with major mail servers.
It is our intention to provide these in the future, however at present this is left up to you.
If you have developed an integration component for a major mail server let us know and we will add it to the project!
Can I use jASEN in Outlook?
Yes and no. jASEN is simply an anti spam engine, it is not an anti spam application. Thus jASEN can be used in almost any java-based
context, but does not provide any significant reference implementations for doing this.
If you want to integrate jASEN into Outlook you will have to create your own Outlook or Exchange addin. We are however, currently working on
a desktop anti-spam product based on jASEN but it is unclear whether this will be part of the open source project or a stand-alone commercial application at this point
jASEN seems to take a long time to scan a message. How to I make it faster?
jASEN does several string and/or character based tokenization and manipulation, however this has a negligible effect on the total time taken for a scan.
The single most expensive operation is DNS, and reverse DNS lookups. In particular, lookups of hosts which do not exist. Both the RBLScanner (Realtime Blackhole List) and the
SenderVerificationScanner perform DNS lookups of domains and/or IP addresses. Unfortunately in the case of spam, these addresses are often false and do not correspond
to any valid DNS record. Thus there will be a DNS "timeout" (usually in the order of 1-2 seconds) if the host requested could not be found.
Most DNS servers will cache successful DNS lookup results, but many will not cache DNS lookup failures. This means that every time an unknown host is requested, the
DNS will attempt to resolve the host without looking into its cache.
To solve this problem, jASEN uses two interfaces: DNSResolver and InetAddressResolver to resolve DNS records and domains or IP addresses. jASEN also provides an implementation
of these two interfaces however the default implementations do not provide any caching. Thus they suffer from the same caching problem exhibited by the DNS itself.
We recommend you implement your own DNSResolver and InetAddressResolver if you are finding the performance of DNS lookups to be a problem. By adding a simple cache system such as
OSCache to these resolvers you will be able to have the low level control over the caching of DNS lookups required to overcome the performance problems this presents.
Future versions of jASEN may include cached resolvers using similar caching products however at present this is left up to you.
How do I train jASEN with my own email database?
Refer to the training section for detailed instructions on how to train jASEN.
Can I incrementally train the engine with single emails?
At present no. The engine data file used jASEN is loaded at start-up and is not referenced during normal operation.
We recognise that this is a desirable feature and are currently working towards an incremental training system however at present this is not available.
Can I do a "live update" of the spam data files without stopping the engine?
jASEN comes with an internal auto-update system. This works by downloading a small update parcel file
from an update site and on the basis of the information therein, downloads and installs the relevant updates.
This update system can even update code changes to plugins, however at present is not able to dynamically install changes to the core engine.
Refer to the configuration section for more information on the auto update engine
I am getting false positives. How do I stop this happening?
We recognise that the worst aspect of any anti spam product is falsly identifying ham email as spam, and have made every effort to ensure this does not happen.
However, whilst the likelihood of jASEN generating false positive scores is low, it does happen. There are two key situations where this may occur:
Email newsletters will often exhibit many spam characteristics. Things like only HTML (no TEXT part), spammy words like FREE and OFFER, and mail bugs are common in Email newsletters
and will often make them indistinguisable from spam. In the future email newsletter providers may (we hope) begin to comply with systems like SPF
which will help to prevent false identification however at present there is no elegant solution for the problem with these types of email.
- Email newsletters
- Spammy ham
Spammy ham messages are messages sent from a legitimate (usually human) sender but which contain spammy words and/or markers. If a legitimate sender sends an email containing a
high portion of words like "free" or "mortgage" (etc) it may be identified as spam.
There are two solutions to this problem:
In the case of email newsletters, which are often sent by software rather than a real person, the simplest solution is to provide a
white list of approved sender addresses or (even better) sending mail servers. jASEN does not provide a whitelist plugin by default
however it is a simple plugin to create.
- White/black lists
- Sender verification
In the case of spammy ham the best solution is a combination of white/black lists and sender verification. In almost all cases where jASEN falsly identifies a spam email
as ham, it will return a "borderline" result indicating that it can't be sure of the legitimacy of the email. In these cases a separate verification process could be undertaken
to determine definatively if the sender or their mail server should be white-listed.
Our tests indicate that less than 2% of all email (excluding email newsletters) is falsly identified as spam, and over 95% of these are identified as borderline cases. This means that
jASEN in combination with a white/black list approach becomes 99.999% effective at not falsly identifying ham as spam.
I am getting a java.net.MalformedURLException when I try to run the sample programs
This is most likely a classpath issue. jASEN requires the following folders be in the class path:
These folders are found in the root path of the distributable
How do I add my own filtering systems to jASEN?
jASEN provides a simple but effective mechanism to incorporate your own scanning logic into the engine by creating plugins.
See the plugins section for detailed instructions on how to create your own plugins
Will jASEN work with multilingual spam emails?
No. jASEN uses several word and linguistic techniques to identify spam which rely on the premise that the spam is in English.
Whilst the broad techniques adopted by jASEN are transferrable to non-English (single byte) languages, we simply do not have an extensive enough database of
non-English ham/spam with which to train the engine. It is unclear whether this assumption holds true for double byte languages like Chinese, and the likelihood
is that such languages are not supportive of the techniques used in jASEN.
Fortunately almost all spam is in English, so this limitation does not currently present a serious problem. The increase in the use of languages like Chinese in
an online context however, does indicate that this may not always be the case.
At present there are no plans to provide support for multilingual scanning however we would welcome an thoughts or contibutions on the matter.
Can I use jASEN in a client application?
Yes and no. See the outlook faq topic for more information.