[11/14/2004] Update: Adam Kalsey has a piece from Sep 2003 that includes more or less what I call Secret Tags. Since it’s from Sep 2003, the credit goes to him, even I discovered his piece just today. Adam, too, says one should alter field names.
Captchas are quite useful to identify real users and bots. While a real user is likely to be able to read and understand the captcha and enter to correct characters into a form field, a bot cannot — at least cannot without logic to read the captcha, understand what it says and enter it into the right field (setting the correct URL parameter, that is). By providing a means to lock out bots, captchas help to decrease the amount of spam or other unsolicited requests to web applications.
But captchas have a few drawbacks. They are rather expensive when it comes to server load as they need to be created on every page view. They can be difficult to read, thus making them unusable for some people. They are intrusive and lessen the user experience.
In social terms, by using captchas a website discriminates the legitimate user, making her justify herself again and again.
Using challange-response sets, captchas try to tell humans from machines. The challenge usually is an image, showing a machine-generated series of characters the user has to enter as the response. The captcha system then compares the challenge to the response. If they match, the turing test is passed.
Using machine-generated challenges makes them harder to guess and easier to maintain on the one hand, and easier to crack on the other. If a programmer knows how the captcha is made, he can easily write a tool to read it and is thus able to answer correctly to all challenges the system might throw at him. A few misses are not important since spammers go for quantity, not quality. While all web applications of a certain kind use the same captcha system — i. e. all phpBB forums, Movable Type or WordPress blogs, … use the same captcha system — by cracking one captcha system we gain access to many websites.
Of course you can write a more creative challenge-response generator, using real questions the user has to answer (C: Where is the city of New York, USA or Canada? R: USA). This can solve the crack-issue, but increases intrusiveness even more.
To make a long story short: captchas can help fight spam, annoy legitimate users and are — more or less — easy to crack, and if once cracked no longer useful.
For a website where users can enter content, register and comment, I needed a system to hold bots at bay regarding registration and commenting. I did not want to use captchas, reasons see above.
I came up with the following concept, called Secret Tags. On every page view for the registration and comment form, a secret tag is generated. For generating the tag I use a function that returns a eight characters long string consisting of lowercase letters (these are 3*10^23 variantions).
The tag is written in a database table, in my case a MySql heap table, together with a timestamp. Heap tables have the advantage of living completly in RAM, which makes them noticeable faster than usual table types.
When outputting the form, a hidden input field is inserted. Its value equals to the just created Secret Tag.
The form is sent to the client and is returned after some time, including the Secret Tag.
When analysing the returned form data the Secret Tag returned is compared to the Secret Tags in the table. If there is a match, the returned form data comes from a legitimate user and is further processed. The tag is deleted from the table, making it usable only once, and the form data gets processed as it was originally intented.
If the tag returned with the form data is not in the database, or if there is no tag at all, the form data is ignored. Either an error message is shown or the script silently exits.
Secret Tags are timely bound and valid only for a certain period of time, after which they get deleted from the table, making all later requests using these tags render useless.
Using Secret Tags gives users no headache, they don’t even notice them — web browsers return hidden input fields silently and without bugging the user. A bot can easily access ST-protected forms and read them, using them in his request. Granted, this is easier than downloading a picture and running an OCR over it. But basically it’s just the same: download the form and read the challenge.
Absolute protection is nearly impossible, and the more security you want, the less freedom you can provide. Secret Tags perhaps are less secure, but give legitimate users more freedom, blocking most spam bots as captchas do.
Regardless what system you use, there are some tweaks how you can make it even harder for the bots.
- Alter field names. Maybe even alter them every once in a while, maybe even automated.
- Alter script names. Maybe even alter them every once in a while.
Quite simple, ain’t it? As mentioned earlier, spam bots go for quantity, not quality. If they can hit some thousend standard blogs/forums, some they do not hit are not important. So even by using less secure, less intrusive Secret Tags you lock out 99% of spam.