Sunday, June 29, 2014

Are you human?

I guess everyone of us by now have been horrified by captchas. If you don’t know what a captcha is, or if you ever wondered why you were asked to type numbers or letters from a super difficult image, well that was because the website wanted to verify that you are a human and not a bot (a program meant to go create millions of accounts automatically – thereby crippling the website).

I have noticed that captchas are becoming increasingly complex. It is sort of obvious because computers have become excellent at character recognition. What that means is – now only highly complex images with highly distorted characters can’t be recognized by computers and their algorithms.

But the reality is – such images can also not be easily recognized by humans. Case in point is the image below that I got lately cause my email account got blocked. I couldn’t get the captcha right for several images after several tries and the website concluded that I am a bot and blocked my IP address.

Captcha

The dawn of the super computers is coming. I can’t even prove myself to be human. It’s just a matter of time before computers also break the Turing test.

4 comments:

  1. BTW: recaptcha is not a free service google provides to us, it is a free OCR service WE provide to Google...
    From Wiki: Scanned text is subjected to analysis by two different optical character recognition programs. Their respective outputs are then aligned with each other by standard string-matching algorithms and compared both to each other and to an English dictionary. Any word that is deciphered differently by both OCR programs or that is not in the English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, along with a control word already known. The system assumes that if the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the 2nd word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words.

    ReplyDelete
    Replies
    1. Spot on. Recaptcha (though looks like a charitable thing to do) is indeed we working for free for the overlords out there :) !!

      Delete
  2. Ahem... Just just last month... http://www.theguardian.com/commentisfree/belief/2014/jun/13/computer-turing-test-humanity

    ReplyDelete
    Replies
    1. Awesome. Didn't know about this. I also haven't seen Bladerunner yet. Should add it to my "Movies to watch" list ;)

      Delete