Bayesian Email Filter

Written by Jessica Duquette
Bookmark and Share

Bayesian email filter is a very technical form of a spam filter that has taken years to produce and continues to be enhanced. The basis of the filter is that it learns from previous spam and non-spam emails that are received and continues to be customized to each person's email. There are various Bayesian filters available for each computer depending on the operating system being used as well as the email platform.

The Bayesian Email Filter Guru

Although listing an actual founder for this filter is difficult, there is one man that has written many papers and performed a lot of work on the subject. Paul Graham has worked with many software developers and also worked at a large internet e-commerce site. Mr. Graham continues to test the Bayesian filters and works to improve their current capabilities.

The problem with the early Bayesian email filter was that it did not produce results high enough to merit further investigation. In 1998 when the subject was first broached, two people presented results that were far from favorable. Their work showed that they were able to filter out 92% of junk email but that it also produced over 1% in false positives, email identified as spam that should not have been.

Learning From Early Work

When Paul Graham began to test on his own he found that he was able to filter out 99.5% of spam with less than 0.3% false positives. What he learned was that the first tests contained several flaws that he could minimize. By looking at email headers and a much larger sample he was able to overcome their shortcomings and produce a better filter.

Bookmark and Share