Too Busy For Words - The PaulWay Weblog
23 03 2006

Thu, 23 Mar 2006

Shuffling a list - the power of factorials.
I needed to shuffle a list of DNA sequences for a work program. Having been suitably chastised by Damian Conway's excellent entry in maddog's ticket picking mini-competition, I decided that in future I'd do more searching on CPAN before re-inventing the wheel for the sixteenth time.

I feel so stupid. I downloaded Algorithm::Numeric::Shuffle, written by abigail, as I think it was the first entry that turned up in a search on CPAN. I should have looked by category under List::Util, being a utility function on lists. But anyway. abigail pointed out in the POD that the Perl rand() routine is a linear congruential generator - in other words, from a given random seed the numbers will always follow a predictable pattern. The algorithm works by going through the entire list and swapping the Nth member with another random member. Thus, the actual final ordering is dependent only on the initial RNG seed, which is a 32-bit number.

(The List::Util::shuffle routine suffers from this similarly, despite using a somewhat more bizarre method of giving a unique random ordering to the list.)

Now, 13! is 6227020800, and 232 is 4294967296, so there are more combinations for a 13-element list than a 32-bit number. Which means, if you think about it, that there are some combinations that you will never get from the shuffle routine. It's just simple information theory, like compressing data: there's no random seed that will cause that ordering of elements to come out.

Here's where once again I forgot to learn the basic rule of my programming existence: What You Want Has Probably Already Been Invented. I wrote an email to abigail suggesting that, if the user of the shuffle routine wanted a truly random shuffle, they could pass in a sub reference to their own rand() function, or one from e.g. MCrypt or OpenSSL::Rand, that does at least have more bits of random seed (or, for preference, draws on a 'true' source of randomness.) I even started writing up a revision of abigail's code to do this.

Then I asked on the #perl channel on irc.freenode.net whether the inbuilt rand() function could be passed as a sub reference (it can't apparently). Someone asked why I wanted to do such a thing. I told them the story. I spent some time convincing them that the statistics said it was possible, and that someone might actually care about being statistically correct. Then someone else said "why don't you just override the rand() function for that module?".

Why not indeed! Perl doesn't make package namespaces sacred, so it's perfectly possible to put a sub rand(!) definition in my code after a package List::Util declaration which calls the rand() function I want. Of course, I've yet to work out how to do that, and the random picks of the sequences are good enough for testing so far, so it's a bit of a moot point. But I hate it when I should have known better all along.

I'm somewhat glad that my email to abigail bounced...

posted at: 13:17 | path: /tech/perl | permanent link to this entry

At last, I've worked out why <pre> formatting in Perl CGI is wrong!
The problem: When I dump a file out in a <pre> block, it's come out with a space in front of every line but the first.

The realisation: I used a $cgi->comment() block to throw in some debugging comments in a CGI script, and lo and behold every line but the first had a space in front of it. For some reason, today it reminded me of the $cgi->p() formatting, which will put a space between each argument if it's given a list. I'm guessing this 'join with a space' behaviour is going all the way through the rest of the CGI module routines.

The solution: Write my own <pre> formatter routine, which just prints '<pre>', the text, and '</pre>'. Ugh. There's probably a better way somewhere in CGI, but at least I know what's going wrong.

posted at: 12:56 | path: /tech/perl/web | permanent link to this entry

Or: Why I need my head read.
I did a quick bit of hedge-trimming today. This involved me walking up and down to see the line, snipping things off, and leaping up to snip off the branches that were out of convenient standing range. Why, you ask, was I leaping up? Because I was too lazy to get the ladder. It was only a five-minute operation, after all...

It's only a Cotoneaster. Some day we'd like to replace it with a native. I think the Grevillea we've got in the front would be ideal - it's very prickly and apparently most people are allergic to it. If we had a problem with people wandering into our yard, which we don't that'd teach them a severe lesson.

There was a small family, or clique or whatever, of Superb Blue Wrens in the garden, including one I didn't see but Kate picked up as a male just going from non-breeding to breeding status. This means he goes from brown with flashes of blue to brilliant blue all over. Nature is wonderful. We're very lucky to have a garden that attracts them.



posted at: 12:44 | path: /personal | permanent link to this entry

Research is your friend
I have a rule in Thunderbird that tags everything that has an X-Mailer header as 'The Bat!' as spam. It's never been wrong yet. And yet a bit of research found that The Bat! is apparently a legitimate mailer. Just as Outlook gets imposted by spam bulk mailers, apparently they also impersonate The Bat!. Strangely enough, I haven't seen them use the default setting in the Advanced Mass Sender, which is 'Advanced Mass Sender'. I wonder why...

I find it ironic that this probably makes The Bat! look much more popular and Outlook less so...



posted at: 12:40 | path: /tech | permanent link to this entry


All posts licensed under the CC-BY-NC license. Author Paul Wayper.