[11/14/2004] Update: Adam Kalsey has a piece from Sep 2003 that includes more or less what I call Secret Tags. Since it’s from Sep 2003, the credit goes to him, even I discovered his piece just today. Adam, too, says one should alter field names.
And while being there, pay Yoz and Shelley a visit.
Captchas are quite useful to identify real users and bots. While a real user is likely to be able to read and understand the captcha and enter to correct characters into a form field, a bot cannot — at least cannot without logic to read the captcha, understand what it says and enter it into the right field (setting the correct URL parameter, that is). By providing a means to lock out bots, captchas help to decrease the amount of spam or other unsolicited requests to web applications.
But captchas have a few drawbacks. They are rather expensive when it comes to server load as they need to be created on every page view. They can be difficult to read, thus making them unusable for some people. They are intrusive and lessen the user experience.
In social terms, by using captchas a website discriminates the legitimate user, making her justify herself again and again.
Using challange-response sets, captchas try to tell humans from machines. The challenge usually is an image, showing a machine-generated series of characters the user has to enter as the response. The captcha system then compares the challenge to the response. If they match, the turing test is passed.
Using machine-generated challenges makes them harder to guess and easier to maintain on the one hand, and easier to crack on the other. If a programmer knows how the captcha is made, he can easily write a tool to read it and is thus able to answer correctly to all challenges the system might throw at him. A few misses are not important since spammers go for quantity, not quality. While all web applications of a certain kind use the same captcha system — i. e. all phpBB forums, Movable Type or WordPress blogs, … use the same captcha system — by cracking one captcha system we gain access to many websites.
Of course you can write a more creative challenge-response generator, using real questions the user has to answer (C: Where is the city of New York, USA or Canada? R: USA). This can solve the crack-issue, but increases intrusiveness even more.
To make a long story short: captchas can help fight spam, annoy legitimate users and are — more or less — easy to crack, and if once cracked no longer useful.
Secret Tags
For a website where users can enter content, register and comment, I needed a system to hold bots at bay regarding registration and commenting. I did not want to use captchas, reasons see above.
I came up with the following concept, called Secret Tags. On every page view for the registration and comment form, a secret tag is generated. For generating the tag I use a function that returns a eight characters long string consisting of lowercase letters (these are 3*10^23 variantions).
The tag is written in a database table, in my case a MySql heap table, together with a timestamp. Heap tables have the advantage of living completly in RAM, which makes them noticeable faster than usual table types.
When outputting the form, a hidden input field is inserted. Its value equals to the just created Secret Tag.
The form is sent to the client and is returned after some time, including the Secret Tag.
When analysing the returned form data the Secret Tag returned is compared to the Secret Tags in the table. If there is a match, the returned form data comes from a legitimate user and is further processed. The tag is deleted from the table, making it usable only once, and the form data gets processed as it was originally intented.
If the tag returned with the form data is not in the database, or if there is no tag at all, the form data is ignored. Either an error message is shown or the script silently exits.
Secret Tags are timely bound and valid only for a certain period of time, after which they get deleted from the table, making all later requests using these tags render useless.
Using Secret Tags gives users no headache, they don’t even notice them — web browsers return hidden input fields silently and without bugging the user. A bot can easily access ST-protected forms and read them, using them in his request. Granted, this is easier than downloading a picture and running an OCR over it. But basically it’s just the same: download the form and read the challenge.
Absolute protection is nearly impossible, and the more security you want, the less freedom you can provide. Secret Tags perhaps are less secure, but give legitimate users more freedom, blocking most spam bots as captchas do.
Regardless what system you use, there are some tweaks how you can make it even harder for the bots.
- Alter field names. Maybe even alter them every once in a while, maybe even automated.
- Alter script names. Maybe even alter them every once in a while.
Quite simple, ain’t it? As mentioned earlier, spam bots go for quantity, not quality. If they can hit some thousend standard blogs/forums, some they do not hit are not important. So even by using less secure, less intrusive Secret Tags you lock out 99% of spam.
problem with any solution like this is: if it turns out to be working now, many people will adopt it, and spammers/bot-writers will simply update their auto-submit routines (e.g. look at the form, and make sure you find any hidden parameters, making sure to pass them on in your reply). it’s an arms race that can’t be won, imho. good idea though.
Patrick, you are right that it is a constant fight. But in my opinion it is a fight webmaster vs. spammers, not webmasters vs. users — so I feel that I don’t ave to bother my legitimate users with counter spam tatics.
Das Problem mit hidden Fields ist, dass Sie ich sie mitgeben muss und meines Erachtens egal wie sie heißen, sehr leicht auszuwerten sind. Diese Taktik verspricht nach meinem Geschmack eine zu kurze Lebensdauer.
Grafische Captchas sind eine gute Lösung, insbesondere dann, wenn Sie noch vorgelesen werden, um auch Sehbehinderten den Zugang nicht zu versperren. Je nach Qualität der Captchas beiten sie einen guten oder schlechten Schutz.
Der Vorteil einer zufallsgenerierten Textaufgabe ist, dass ich sie jederzeit offen mitgeben kann. Die Variationsbreite dieser Lösung ist unendlich und die Maschine muss schon die Kreativität eines Menschen besitzen um Sprache und Inhalt zu verstehen.
Ich sehe auch noch einen weiteren Vorteil, ich kann mit Witz arbeiten und meine Teilnehmer zum Lachen bringen oder aber ich kann daraus ein Selektionstool erzeugen, dass Wissen voraussetzt.
Das macht für mich die Variation noch interessanter. Auch wenn vor allem die Selektion, wer darf kommentieren über Fachfragen oder IQ im Netz verpöhnt ist, so liegt dies bei der einen oder anderen Site absolut im Interesse des Anbieters.
Sorry for my rudeness to answer in German. The problem with hidden fields is, that what ever you call it, the maschine can easily find out and send it any hidden field with whatever content back. This makes this solution rather short term.
Captchas are as far as I’m concerned rather good solutions. It is a matter of the captcha, how it is created. And if you read it loud if people with vision defect can take part.
The advantage with my solution is the possibility to send everything to the maschine and the maschine although it can read and analyse everything it won’t be able to cope. The variety is as manyfold as there are people and there will be the creativity of humans.
The other point is, that you can use the module to tell jokes and to select through knowledge an IQ, which might be in the intrest of the webmaster.
selection through knowledge and IQ is just a form of discrimination, in my opinion. questions that seem fairly obvious to you and me may not be obvious at all to somebody living in another country, whose mother tongue is not english (or whatever the site’s language is in), and with slight learning difficulties.
no, i agree that we need a solution that is transparent to the user, and doesn’t bother them just because there are unscrupulous spammers out there…
I suppose we agree, that reading, understanding und commenting to a text takes a certain degree of understanding. It is not a matter of election. It is a matter of conversation. And although it is a public text doesn’t mean it is written, so that every accidental reader can cope with the content. If a tests corresponds with its content, skills, knowledge and intelligence it is most likely fun for everybody taking part in the conversation.
You might say to force intelligent people to talk about kittens and knitting patterns, while they’d rather want to solve the chaos theory is just as much a discrimination as forcing debile people to solve a test that is to hard for them.
To select readers an commentators by tests doesn’t automatically mean discrimmination. It just means selection, as every text does by wording, subject and context. As we are talking about a text medium it is a fair and just instrument for putting up fences against spambots.
We have a totally different viewpoint here, Silke.
Of course you are right that to select readers, captchas do work.
But this is not, what Patrick or I want to achieve. We do not want to select readers, we want to lock out spam.
Es entbehrt nicht einer gewissen Ironie, dass gerade dieser Eintrag zwei Pseudo-Spam Einträge hat. Oder wurden die Einträge moderiert und etwaige Links entfernt?
;-)
Shit happens, und meistens gleich richtig ;)
Dein Secret Tag-Ansatz ist im übrigen sehr interessant. Zu dem Ansatz, die Feldnamen zu ändern gibt es einen interessanten Eintrag unter: http://elliottback.com/wp/archives/2004/11/29/spam-stopgap-extreme/
Der Trick hier ist, dass der Feldname der md5-hash eines Wertes ist, der vom Server mitgeschickt wird und client-seitig per JavaScript berechnet werden muss. Auf’m Server wird dann die Existenz dieses Feldes geprüft.
In meinen Augen ist das sicherlich zugänglicher als ein Captcha und, was fast wichtiger ist, benutzerfreundlicher.
Außerdem - abgesehen von der Notwendigkeit, das JavaScript aktiviert sein muss - ein stückweit besser als dein Ansatz. Sprich: diese Notwendigkeit ist gleichzeitig die Stärke dieses Ansatzes.
schöne Grüße
MD5-Hashes lassen sich auch ohne Javscript berechnen, jedenfalls von Spam-Skripten. Prinzipiell aber eine nette zusätzliche Idee ;)
Ich dachte, der Trick besteht darin, dass a) das Spam-Skript nicht weiss, dass es sich um einen md5-hash handelt und b) Spam-Skripte gar kein JavaScript ausführen.
look at this