bmf: Bayesian Mail Filter

March 29, 2008

I’ve been using bmf (Bayesian Mail Filter) lately to filter spam. Having previously used SpamAssassin, it’s obvious that bmf is a much smaller and more focused application. It’s written in C so it’s very fast and it’s also very easy to operate.

To give bmf some initial training take two mbox files, one with spam and one with ham, and run the following:

cat spam.mbox | bmf -S
cat ham.mbox | bmf -N

In the future if bmf makes a mistake just pipe the incorrectly flagged message to bmf -N or bmf -S as necessary.

Next, bmf needs to process each message you receive. This is usually done using a .forward file to call procmail and a .procmailrc file which calls bmf. For example, here is my .forward file at SDF:

"|IFS=' '&&exec /usr/pkg/bin/procmail -f-||exit 75 #jrblevin"

If you have an SDF account, running nospam -e should set this up for you (but it will also set up other procmail rules which you might not want if you’re going to use bmf).

Now, once procmail gets the message, it should send it to bmf. Here’s my .procmailrc:

:0fw
| /usr/pkg/bin/bmf -p
:0:
* ^X-Spam-Status: Yes
/arpa/gm/j/jrblevin/mail/spam

This says to pipe the message to bmf and if bmf declares that it’s spam (by setting the X-Spam-Status header), then save the message to ~/mail/spam. Adjust the bmf and mailbox paths accordingly.

If you use mutt, here are the keyboard shortcuts I use:

# Classify mail as spam or ham
macro index S "| bmf -S\n<save-message>=spam\n" "SPAM"
macro pager S "| bmf -S\n<save-message>=spam\n" "SPAM"
macro index H "| bmf -N\n" "HAM"
macro pager H "| bmf -N\n" "HAM"

Pressing S pipes tells bmf that the selected message is spam and moves it to the spam folder. Pressing H classifies the message as ham.