Too Busy For Words - The PaulWay Weblog
12 07 2006

Wed, 12 Jul 2006

Too much time, too little gain?
My 'home' home page - http://tangram.dnsalias.net/~paulway/ - has, for a while now, had the appearance of an old greenscreen monitor playing a text adventure. Since it's more or less just a method for me to gather up a few bits and pieces that I can't be bothered putting up on my regular page - http://www.mabula.net - I'm not really worried by creating a work of art.

But, the temptation to carry things too far has always been strong within me. So, of course, the flashing cursor at the end of the page wasn't good enough on its own: I had to have an appropriate command come up when you hovered over the link. After a fair amount of javascript abuse, and reading of tutorials, I finally got it working; I even got it so that the initial text (which has to be there for the javascript to work) doesn't get displayed when the document loads.

Score one for pointless javascript!

posted at: 23:17 | path: /tech/web | permanent link to this entry

Shorter isn't always better!
I'm reading lines in a format I invented that uses a simple run-length compression on zeros: a line can be something like '1 4 2 8 5=7 3 6' and that means 'the values 1, 4, 2, 8, five zeros, 7, 3 and 6'. My code was:

foreach $count (split(/ /, $_)) {
    if ($count =~ /(\d+)=(\d+)/) {
        push @counts, (0)x$1, $2;
        $csubnum += $1 + 1,
    } elsif ($count =~ /(\d+)/) {
        push @counts, $count;
        $csubnum++;
    } elsif ($count =~ /\s*/) {
        next;
    } else {
        warn "bad format in WAIF $vers line: $_ (part $count)\n";
    }
}

"No, wait!" I thought. "I can do all that in a mapping:"

@counts = map (
    {$_ =~ /(\d+)=(\d+)/ ? ((0) x $1, $2) : $_}
    (split / /, $_)
);
$csubnum += scalar @counts;

Testing, though, proved another thing. Reading a file with a reasonable number of zeros per line (lines like '114 0 3 6=3 27=3 3=1 10=3 79=1 8=1 0 1 4=3 16=1 0 1 43=7 15=12 36=16 27=2' are typical) took 21 seconds with the foreach code and 28 seconds with the map code. So that's an extra 33% time per file read. I can only assume this is because Perl is now juggling a bunch of different arrays in-place rather than just appending via push. Still, it's an interesting observation - the regexp gets tested on every value in both versions, so it's definitely not that...

posted at: 17:10 | path: /tech/perl | permanent link to this entry

New term, rudeness optional
I was explaining on #mythtv-users about the origins of RTFM, and had to explain FGI as well. I realised that we also need another term, FLOW as well when someone is asking for a detailed, moderately comprehensive exposition on a particular topic.

Put it into circulation, people! :-)

posted at: 14:09 | path: /tech | permanent link to this entry

A new text-based captcha scheme
I had an idea for a text-based captcha that could work by cut-and-paste in the browser but would take a sophisticated CSS parser to decode automatically:

Define nine IDs in CSS, only three of which are set to display and the other six set to not display. These could be a random choice on your part, but remains fixed in the CSS (i.e. the CSS can be static). Then, pick nine numbers or words and put each one in a span with a different ID. The script generating the captcha knows which three of the nine words will be displayed, so it saves those against a random number which you generate and put in a hidden field. Nothing relates the three words to the token except the data on the server, and while the user would see only the three set to display, the source HTML includes all nine. You could even mix up the order of the non-displaying fields, so long as the displaying fields always turned out in the same order.

I realise that it wouldn't take too much code to read the CSS and read the page and work out which fields were going to be displayed. But the whole point to these things is to act like a flashing light on a burglar alarm - to deter all but the (most) determined and resourceful. And I like the idea of not having to generate images - too many of those graphic captchas that I've seen wouldn't be too hard to decode, I reckon.

Another point to using text is that you can include words which identify the site. Fraudsters commonly use a variation of the Mongolian Horde Technique to get past the captchas on Yahoo and other web mail services: set up a simple porn site and require people to register by filling in a captcha - but the captcha they fill in is actually the captcha that the fraudster's script has grabbed off the web mail system. Porn-seeker fills in captcha, result is posted back to Yahoo, everyone's 'happy'. I don't know why the common users of these captchas don't include a watermark that includes the site it came from. In fact, that could be a very good captcha - take the company's name in two randomly chosen shades, overlay a translucent word in another random shade, and get people to pick the word that isn't the company's name.

Quick, off to the patent office!

posted at: 12:53 | path: /tech | permanent link to this entry


All posts licensed under the CC-BY-NC license. Author Paul Wayper.