Saturday, 3 April 2010

Why Greylisting is WAY better than Spamassassin

Having run my own ISP/hosting company for a while now, it has struck me yet again how much more effective greylisting is than Spamassassin. I get a system log every night, and I check the results of the accuracy of the spam filters, so I can say this with confidence.

The reason I say greylisting is more effective is:

a) It gets fewer false-positives (i.e. it doesn't often discard an email from a legitimate person) - typically around 0.3% - 1.0% false-positives, or an accuracy of about 99-99.7%
b) It uses less CPU time - one process, generally, is present, and the contents of the email don't have to be downloaded, uploaded or accepted for inspection at all, before the filter makes its decision.

By contrast, at any time on my server, there can be as many as 20 copies of spamassassin processing incoming messages. Furthermore, its accuracy is dreadful; it typically has two or more false-positives every day, especially since many of my subscribers run small companies and send out mail that may look like spam. I had to explicitly tell spamassassin to not score my local users' mail! Surely that should be a given? Surely, by default, spamassassin should do that, or at least have a clearly signposted option? It was rejecting my own users' mail and letting 419 scams through! I had to add a whole bagful of extra filters to it, to catch images with spams written in the image, 419s which had different from and replyto addresses, and dialup-relayed mails. Furthermore, Spamasssassin got a LOT of false-positives from MS Outlook. It seems that because Outlook does not (surprise surprise!!) adhere to SMTP standards, Spamassassin scores any mail from Outlook, by default, very high: as spam. I had to explicitly lower the scores of all the Outlook-related tests because my legitimate users were experiencing mail rejections as a mere result of using Outlook. Of course, this is Microsoft's fault, but the guys who wrote Spamassassin should just accept that Microsoft and Outlook are not going away anytime soon, so they'd best not presume that a Microsoft email is by default, a spam.

Another thing that baffles me is that there seems to be no way to recognise spammy-looking email addresses, by default. You have to tinker with from-address-contains-numbers rules to get this to even work. Fortunately, milter-greylist catches most of them before they even get to Spamassassin!

Spamassassin, as you're probably aware, is a content filter. That means that the mail has to actually arrive on the server before it is scanned for spam-like features. This not only uses up your bandwidth, but your CPU time. On the other hand, greylisting just looks at who the mail is from, who it is to, and what its source IP address is. If the source IP has been blacklisted, the mail is rejected. If the from and to addresses have not been involved in a previous exchange, the mail is temporarily rejected. If the source IP address is not a known server, the mail is also temporarily rejected. If the server sending the mail re-sends it a second time, the mail is accepted. If the server sending the mail does not re-send the mail, then the mail has effectively been dropped. As I mentioned above, this simple test is about 99.7% effective and accurate. Most days I have maybe one or fewer false positives, out of thousands of emails. Greylisting is also particularly effective at deleting spams from botnets - automated spam delivery networks - because they don't use legitimate SMTP servers.

The trouble with greylisting is that many well-known ISPs or email sources do not understand the RETRY request that greylisting sends. A suprisingly many large companies use subcontracted ISPs to relay their email for them. Facebook, Hotmail, Gmail, Apple, are all guilty. Which means that you have to explicitly whitelist these relay servers to allow mail through. This is obviously a problem, because then anyone in those companies, or anyone on the network that those companies relay through, can then send spam that gets through greylisting. I'd like to appeal to all companies to refrain from doing this. Please do not outsource your mail sending/relaying. Please keep it in-house and deploy SMTP-compliant servers, because currently your ISPs and downstream relays are NOT SMTP-compliant. Yes, Gmail, Apple, Facebook - I'm talking to you. You ought to know better. I expect it of Hotmail, after all, Hotmail is a chief source of spam. But the rest of you?? What gives?

Greylisting is the future of spam eradication. Please will EVERYONE use a proper email server program - i.e. Sendmail - so we can get rid of spam for once and for all?

php 7 nightmare

OK so Centos 6 insists on installing php 5.3 and even if you download other RPMs and install them, they do not replace the existing 5.3 whic...