$Id: spamass-howto.txt,v 1.8 2007/12/18 19:00:08 malin Exp $

 $Author: malin $

 $Date: 2007/12/18 19:00:08 $

=============================================================================
                         How to block Spam at the BIC
=============================================================================

This file can be found at:
http://www.bic.mni.mcgill.ca/~malin/bicsystems/spamass-howto.txt

=============================================================================


The email servers at the BIC are configured to automatically filter all
incoming and outgoing messages, looking for possible spam using a mail filter
called SpamAssassin (see <http://http://wiki.apache.org/spamassassin>. The
instructions and recipes are given bellow. You should carefully read this
document before enabling them to filter your incoming emails. It is important
to be aware that filtering emails is a dangerous business (you might end up
ditching away valid emails!) but if you follow the rules given below all
should be good.

We also have configured the BIC email servers to use a feature called 'Grey
Listing'. Read further below to learn about it as it is a really effective
way of getting rid of the pesky spam.

The following sections follow:

* Introduction
* Basic Filter Recipe
* More Aggressive Filter Recipe
* How to Whitelist/Blacklist an Email Address
* Headers Added by SpamAssassin
* Training SpamAssassin
* Grey Listing: How it Works

=============================================================================

                                 ************
                                 Introduction
                                 ************

SpamAssassin computes a score using a suite of tests applied to the headers
and body of emails and when this score reaches a (configurable) threshold a
special 'tag' is inserted in the email headers and by using a filter you can
'detect' it as spam and decide what to do with it: trash it, move it to a
'trash spam' mailbox, etc.

Other specific tags are inserted by SpamAssassin: they are described below but
if you as a user want to detect spam you have to search for that tag: see the
*Basic Filter Recipe* shown below.

A very good source of info is hosted at the SpamAssassin site
<http://wiki.apache.org/spamassassin>. 
Read and check frequently the Faq and Wiki.

Note that BIC email servers only tag emails for possible spam, they don't
discart them. They are configured very conservatively (X-Spam-Status will be
set to 'Yes' only if SpamAssassin calculated a score >= 5) and it's up to you,
as a user, to decide if you want to have a more aggressive filtering. If you
do then have a look at the *More Aggressive Filter Recipe* below.

Once you use spamassassin we suggest that you 'train' it so that it get
better at differentiating spam from ham and also because spammers are nasty
creatures and they devolve in curious ways. The Bayesian training is explained
at <http://wiki.apache.org/spamassassin/BayesInSpamAssassin> and excerpts are
given below under the section *Training SpamAssassin*.

 ****************************************************************************
 * VERY IMPORTANT NOTE: WE WILL NOT BE HELD RESPONSIBLE FOR ANY VALID EMAIL *
 * THAT _YOU_ DISCARTED AS SPAM -- A FALSE-POSITIVE -- SHOULD YOU DECIDE TO *
 * USE SPAMASSASSIN.                                                        *
 ****************************************************************************

=============================================================================

                             *******************
                             Basic Filter Recipe
                             *******************

If you don't have a file called .procmailrc create it at the top of your home
directory and with your prefered file editor (say 'emacs ~/.procmailrc') and
stick the following lines into it:


#----------------Do Not Copy This Line!----------------------------
SPAMBOX=$HOME/.spambox

:0:
* ^X-Spam-Flag: YES
$SPAMBOX
#----------------Do Not Copy This Line!----------------------------


If you already have a ~/.procmailrc make sure you insert the above recipe
*before* any other existing recipes. 

This will redirect all spam to a file called '.spambox' at the top of your
home directory. If you don't even want to see the nice juicy spam, replace
'spambox' by '/dev/null'. 
If you do redirect the tagged spam to a file MAKE SURE TO CLEAN THE CRAP once
in a while because that file will grow fast and you might go beyond the quotas
set for users home directories. A cronjob is perfect for that, ie, login on
yorick and type the following in a xterm/shell: 'crontab -e'. This will start
an editor and just stick the following in it: 
(very important: all in one line!)

----------------Do Not Copy This Line!----------------------------
6 0 * * 1 umask 077; cd $HOME; if test -s .spambox -a "`wc -c .spambox |
awk '{print $1}'`" -ge 10240; then mv -f .spambox OLD.spambox; touch .spambox; fi
----------------Do Not Copy This Line!----------------------------

This will cleanup you spambox file (~/.spambox) every Sunday at midnight. I
suggest to review this file once in a while for any false positive that might
have been redirected in there.

Make sure that your ~/.procmailrc belongs to you and that no other group
or user can write to it (reading bits are not enforced) as otherwize the
mail server will refuse to acknowledge it for obvious security reasons:

chmod 600 ~/.procmailrc

*****************************************************************************
* WARNING: If you do use SpamAssassin, you're on your own: don't come       *
* complaining to us that someone sent you an email and you didn't receive   *
* it because it was detected as spam and you threw it away!                 *
*****************************************************************************

=============================================================================

                        *****************************
                        More Aggressive Filter Recipe
                        *****************************

If you want to modify spamassassin behaviour, you can do your own filtering by
using the following procmail recipes (do as above and stick the following in
your ~/.procmailrc ***before*** any other existing recipes!) :

----------------Do Not Copy This Line!----------------------------
SPAMBOX=$HOME/.spambox
:0fw: spamassassin.lock 
| /usr/bin/spamassassin

:0:
* ^X-Spam-Flag: Yes
$SPAMBOX
----------------Do Not Copy This Line!----------------------------

You do the same thing as in the previous recipe except that before looking
for possible tagged spams you do you own filtering. The nice thing about
this is that as a user you can then modify the spamassassin behaviour by
editing ~/.spamassassin/user_prefs and customizing the scores needed for
a message to be considered spam or modifying the score that spamassassin
associates to a particular test. For instance, in my case, I've bumped
the scores associated with email originating from well known spam site
along with HTML email (used a lot by Phishers) and I lowered the threshold
for spam from 5 (default value) to 4.5. With that I have litterally no more
spam in my mailbox with very very few false-positives. I also regularly
train spamassassin (see section below 'Training SpamAssassin').


###########################################################################
# SpamAssassin user preferences file.  See 'man Mail::SpamAssassin::Conf' 
# for details of what can be tweaked.
###########################################################################

# How many hits before a mail is considered spam.
required_hits	4.5	

# Whitelist and blacklist addresses are now file-glob-style patterns, so
# "friend@somewhere.com", "*@isp.com", or "*.domain.net" will all work.
# whitelist_from	someone@somewhere.com

# Add your own customised scores for some tests below.  The default scores are
# read from the installed spamassassin rules files, but you can override them
# here.  To see the list of tests and their default scores, go to
# http://spamassassin.org/tests.html .
#
# score SYMBOLIC_TEST_NAME n.nn
score AMATEUR_PORN                   4.5
score BAYES_99                       4.5
score BIZ_TLD                        4.5
score DNS_FROM_AHBL_RHSBL            4.5
score DRUGS_MUSCLE                   4.5
score DRUGS_DIET                     4.5
score EARN_PER_WEEK                  4.5
score HTML_MESSAGE                   4.5
score HTML_80_90                     4.5
score MICROSOFT_EXECUTABLE           4.5
score HARDCORE_PORN                  4.5
score HG_HORMONE                     4.5
score HOT_NASTY                      4.5
score INCREASE_SEX                   4.5
score IMPOTENCE                      4.5
score MILLION_USD                    4.5
score MORTGAGE_PITCH                 4.5
score NIGERIAN_BODY1                 4.5
score NIGERIAN_BODY2                 4.5
score NIGERIAN_BODY3                 4.5
score OFFSHORE_SCAM                  4.5
score PENIS_ENLARGE                  4.5
score PENIS_ENLARGE2                 4.5
score RCVD_IN_BL_SPAMCOP_NET         4.5
score RCVD_IN_DSBL                   4.5
score RCVD_IN_NJABL_PROXY            4.5
score RCVD_IN_NJABL_DUL              4.5
score RCVD_IN_SBL                    4.5
score RCVD_IN_SORBS_HTTP             4.5
score RCVD_IN_SORBS_WEB              4.5
score RCVD_IN_SORBS_DUL              4.5
score SUBJECT_SEXUAL                 4.5
score STOCK_ALERT                    4.5
score US_DOLLARS_3                   4.5
score URIBL_AB_SURBL                 4.5
score URIBL_SBL                      4.5
score URIBL_OB_SURBL                 4.5
score URIBL_WS_SURBL                 4.5

###########################################################################


See http://spamassassin.apache.org/tests.html to know all the different
tests that spamassassin uses and how to modify them to your liking.

=============================================================================

                 *******************************************
                 How to Whitelist/Blacklist an Email Address
                 *******************************************

Often you have emails that you know are not spam but nevertheless spamassassin
incorrectly tagged them as spam (false-positives). All you have to do is to
'whitelist' them by adding a line in your ~/.spamassassin/user_prefs

whitelist_from joe@hotmail.com

Spamassassin will not redirect the messages from <joe@hotmail.com> to your
spambox even though it might have tagged it as spam. Multiple addresses per
line, separated by spaces, is OK.  Multiple "whitelist_from" lines is also OK.

The same principle applies to 'blacklist' an email address:

blacklist_from add@ress.com

Used to specify addresses that are often tagged (incorrectly) as non-spam, but
which the user doesn't want.  Same format as "whitelist_from".


=============================================================================

                        *****************************
                        Headers Added by SpamAssassin
                        *****************************

SpamAssassin will add a few headers to an email after it has processed it.
These are usually hidden but can be easily shown if you don't use a
brain-dead mailer like PutLook or other utter pieces of crapware.

Here is an example of a spam I received:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on 
        yorick.bic.mni.mcgill.ca
X-Spam-Level: **********
X-Spam-Status: Yes, score=10.9 required=4.5 tests=BAYES_00,HTML_MESSAGE,
        RCVD_IN_SORBS_WEB,STOCK_ALERT autolearn=no version=3.0.1

Four extra headers have been inserted: the first one telling you
SpamAssassin has computed a score higher that it's configured threshold,
the second one is informing you which version of SpamAssassin was used,
and finally the last two tell you about the spam level of the message and
which tests were used to calculate the final score. You can edit your user
preference file in ~/.spamassassin/user_prefs and modify the default score
values of all these tests which are located in
/usr/local/unstable/share/spamassassin on the BIC systems.

=============================================================================

                            *********************
                            Training SpamAssassin
                            *********************

To train Spamassassin, you get a mailbox full of messages that you know are
spam and use the sa-learn program to pull out the tokens and remember them for
later:

    * sa-learn --showdots --mbox --spam spam-file

Then you get a mailbox full of messages you're sure are ham and teach Bayes
about those:

    * sa-learn --showdots --mbox --ham ham-file

It is important to do both.


=============================================================================

                        *****************************
                          Grey Listing: How it Works
                        *****************************

Grey listing works by assuming that contrarily to legitimate MTA, (Mail
Transport Agent) spam engines will not retry sending their junk mail on a
temporary error. The filter will always temporarily reject mail on a first
attempt, and accept it after some time has elapsed.

If spammers ever try to resend rejected messages, we can assume they will not
stay idle between the two sends. Odds are good that the spam-mer will send a
mail to an honey pot address and get blacklisted in an Internet-distributed
black list before the second attempt.

Grey listing can be enabled on a per user, domain, and IP ranges basis.
Essentially it delays delivery to a greylisted email address by a small amount
of time. I've tested it with a (small) delay of 10m and seems to be really
effective and used in conjonction with spamassassin catches all the 250+ spams
I receive a day :)

Greylist allows whitelisting (do nothing, the default), greylisting (delay)
and blacklisting (plonk! you're not welcome). If you are harrassed by too much
spam just get in touch with bicadmin@bic and make a request to add your bic
email address in the mail server's greylist. The only drawback on a user
perspective is the small delay incured for the final delivery in her mailbox.

I have whitelisted mcgill.ca so emails originating from there are not affected
by greylist, ie, no delay is incured for the final delivery. Also broken MTAs
like gmail.com (that's for Andrew:) are whitelisted along with others like
yahoo, amazon, aol and a bunch of sites with weird if not broken smtp setups.

Obviously the whole BIC domain is whitelisted too :)

You want to be greylisted? 
Make the request by sending an email to <bicadmin @ bic . mni . mcgill . ca>


Have fun.
jf
--
"The Zen nature of a spammer resembles a cockroach, except that the
cockroach is higher up on the evolutionary chain." 
    --Peter Olson, Delphi Postmaster