Fighting Spam
This post originally appeared on the Software Carpentry website.
We recently experienced a spate of spam and fake user accounts on the course discussion forums. In this post I'll briefly explain what we're doing to stop that from happening again, and how the technology works.
The course forums are powered by bbPress, the forum software written by the WordPress folks. When I first installed it I left it with the default configuration settings and without doing anything in particular to lock it down from spammers. Bad idea. When I checked this morning we had over fifty fake user accounts, and two spam postings.
I had to manually remove the fake user accounts — luckily this is easy to do since either the names (e.g. "google directions plus") or the user's website is obviously spammy. I then activated two plugins: Akismet to filter out any future spam posts and reCAPTCHA to stop automated computer programs (i.e. bots) from registering fake user accounts.
Akismet works by inspecting posts or comments for known spam-like features and flags them as spam if they match. It is offered as a web service, so all we needed to install was a simple plugin for bbPress that sends each post and comment to the Akismet server and receives back an answer as to whether it is suspected to be spam or not.
The reCAPTCHA plugin adds a visual or audio test to the registration form on the course forums. The test involves writing down scrambled words presented in an image or an audio clip and the test is designed to be possible for humans to answer correctly but not for computer programs. The reCAPTCHA system is unique in that by correctly solving the tests you are also helping to digitize books, newspapers and radio audio. That's because words for the reCAPTCHA tests come from one of those sources when they have been flagged as unreadable during an automated digitization process. Thus the words are known to be currently unreadable by computer programs and this guarantees that bots will be unable to pass the test. When a human passes the test they will have had to correctly read the word and the correct spelling is sent back to the folks digitizing the material. Nifty. You can learn more about exactly how the reCAPTCHA system works here.