Ever since my last post about the subject, I have managed to reduce
comment spam on this blog by a significant margin. There were a number
of methods that I applied and I will attempt to explain them here,
though I won't go into so much detail that it can be used to circumvent
First of all, I disabled links back to the commenter's website. That
is unless it is a registered site that I have already
approved. A number of spammers and abusers were using this to point back
to their sites which were either spam or phishing sites or contained
malicious code for various uses.
Secondly, I removed the motivation for putting up comment spam by
removing the link code on all posts (except those by registered users).
This can easily be done in Python through either the sgmllib
module or the re module's "sub" method. Once you remove the
link code, all that will be posted is the URL itself without being an
active link. Since this is not the aim of comment spammers, it can act
as an effective deterrent.
Another method that I implemented was to count the number of links
in the post and have the whole post discarded in case it contains a high
proportion of links. This is checked before the link code is removed, as
described above. I still get a notification of these posts, but so far
there has not been a single false positive. This has made it much simpler
to handle the spam. I can just discard all comments marked as comment
Something that would be relatively difficult to implement with most
other blogging software is changing the form input "names" on the comment
form on a periodic basis. This stops spammers from "learning" how your
comment system works and using automated tools to directly POST comments.
This method helped me cut down on a huge amount of comment spam the last
time I changed the values.
I must admit that all this hasn't stopped spam completely, but it has
helped a lot. Some other methods that could be used include the use of
picture verification codes, using Bayesian algorithms to identify
spam and implementing an approval system (using Mailman maybe?). Anyway,
I'll leave those for another day and when I have enough time to work on
On 10th March 2007, at 23:12pm PKT, Farhan said:
How about somehow incorporating Askimet? I use it on my blog with Wordpress. I think it could work with Py.