Monday, November 19, 2012

[ggehnexz] CAPTCHA letters to Santa

To do CAPTCHA in the style of reCAPTCHA, we need a large corpus of text difficult for OCR but easy (easier) for humans.  OCR can do printed text quite well, so handwritten text would be a good next challenge.

Children's letters to Santa might be a good, very large corpus of difficult-to-OCR, mostly grammatical, semantically meaningful handwritten text.  Instead of just a word, provide an entire sentence or more to transcribe so the human can work with the semantics.

Where do letters addressed "North Pole" go, and have they been kept?  There are more letters every year, guaranteeing an adversary cannot exhaustively solve the entire collection.

Any collection of handwritten essays, perhaps from a school, might work.

No comments :