A few days ago I did some maintenance of the software installed on my small server: among other things, the packages in it were outdated and I wanted to get the Libtool changes in (something that happened in pkgsrc...). So, I seized this oportunity to give Bogofilter a try, because SpamAssassin brought the machine to its knees.

I configured Bogofilter to parse all my incoming mail, fed to it by Procmail, following the examples given in the manual page; a painless process. The filter adds the X-Bogosity header to all mails, indicating if they are spam or not (non-spam is called ham, for those that don't know), so that you can later classify them with a simple Procmail rule.

After this little setup, it was frustrating: it catched no spam... obviously, because the words database was empty. So I started classifying all new mails in an "Archive" folder (i.e., "Trash") and in a "Spam" folder by hand, and set up a cron job to scan all mails in those folders periodically to make Bogofilter learn about my spam.

Up until now, I've fed it around 150 spams and more than 1600 hams... which is starting to have some effects: it is able to detect some spam, although there are still a lot of false negatives. I'll keep manually classifying them for some days, hoping that the situation improves (I have almost no doubts about this).

Even though, SpamAssassin catched spam out of the box, without having learned anything. And after learning from more than 15,000 mails, it produced very, very, very few false negatives. I know, this program does a lot more checks than Bogofilter (which is just a bayesian filter), so it can detect spams without training. But... as my server does not swap any more, I'll try to get the best out of Bogofilter. Do you use any of these two? If so, which one, and which are your experiences?

Go to posts index

Comments from the original Blogger-hosted post: