Shuffling a list - the power of factorials.
I needed to shuffle a list of DNA sequences for a work program. Having
been suitably chastised by Damian Conway's excellent entry in maddog's
ticket picking mini-competition, I decided that in future I'd do more
searching on CPAN before re-inventing the wheel for the sixteenth time.
I feel so stupid. I downloaded Algorithm::Numeric::Shuffle, written by
abigail, as I think it was the first entry that turned up in a search on
CPAN. I should have looked by category under List::Util, being a utility
function on lists. But anyway. abigail pointed out in the POD that the
Perl rand() routine is a linear congruential generator - in
other words, from a given random seed the numbers will always follow a
predictable pattern. The algorithm works by going through the entire
list and swapping the Nth member with another random member. Thus, the
actual final ordering is dependent only on the initial RNG seed, which is
a 32-bit number.
(The List::Util::shuffle routine suffers from this similarly, despite using a somewhat more bizarre method of giving a unique random ordering to the list.)
Now, 13! is 6227020800, and 232 is 4294967296, so there are more combinations for a 13-element list than a 32-bit number. Which means, if you think about it, that there are some combinations that you will never get from the shuffle routine. It's just simple information theory, like compressing data: there's no random seed that will cause that ordering of elements to come out.
Here's where once again I forgot to learn the basic rule of my programming
existence: What You Want Has Probably Already Been Invented. I
wrote an email to abigail suggesting that, if the user of the shuffle
routine wanted a truly random shuffle, they could pass in a sub reference
to their own rand() function, or one from e.g.
MCrypt or OpenSSL::Rand, that does at least have
more bits of random seed (or, for preference, draws on a 'true' source of
randomness.) I even started writing up a revision of abigail's code to
do this.
Then I asked on the #perl channel on irc.freenode.net
whether the inbuilt rand() function could be passed as a
sub reference (it can't apparently). Someone asked why I wanted to do
such a thing. I told them the story. I spent some time convincing them
that the statistics said it was possible, and that someone might actually
care about being statistically correct. Then someone else said "why don't
you just override the rand() function for that module?".
Why not indeed! Perl doesn't make package namespaces sacred, so it's
perfectly possible to put a sub rand(!) definition in my
code after a package List::Util declaration which calls the
rand() function I want. Of course, I've yet to work out
how to do that, and the random picks of the sequences are good enough
for testing so far, so it's a bit of a moot point. But I hate it when
I should have known better all along.
I'm somewhat glad that my email to abigail bounced...
posted at: 13:17 | path: /tech/perl | permanent link to this entry
At last, I've worked out why <pre> formatting in Perl CGI is wrong!
The problem: When I dump a file out in a <pre> block, it's
come out with a space in front of every line but the first.
The realisation: I used a $cgi->comment() block to
throw in some debugging comments in a CGI script, and lo and behold every
line but the first had a space in front of it. For some reason, today it
reminded me of the $cgi->p() formatting, which will put a
space between each argument if it's given a list. I'm guessing this
'join with a space' behaviour is going all the way through the rest of the
CGI module routines.
The solution: Write my own <pre> formatter routine, which just
prints '<pre>', the text, and '</pre>'. Ugh. There's
probably a better way somewhere in CGI, but at least I know what's going
wrong.
posted at: 12:56 | path: /tech/perl/web | permanent link to this entry
Or: Why I need my head read.
I did a quick bit of hedge-trimming today. This involved me walking up
and down to see the line, snipping things off, and leaping up to
snip off the branches that were out of convenient standing range. Why,
you ask, was I leaping up? Because I was too lazy to get the ladder.
It was only a five-minute operation, after all...
It's only a Cotoneaster. Some day we'd like to replace it with a native. I think the Grevillea we've got in the front would be ideal - it's very prickly and apparently most people are allergic to it. If we had a problem with people wandering into our yard, which we don't that'd teach them a severe lesson.
There was a small family, or clique or whatever, of Superb Blue Wrens in the garden, including one I didn't see but Kate picked up as a male just going from non-breeding to breeding status. This means he goes from brown with flashes of blue to brilliant blue all over. Nature is wonderful. We're very lucky to have a garden that attracts them.
posted at: 12:44 | path: /personal | permanent link to this entry
Research is your friend
I have a rule in Thunderbird that tags everything that has an X-Mailer
header as 'The Bat!' as spam. It's never been wrong yet. And yet a bit
of research found that The Bat! is apparently a legitimate mailer. Just
as Outlook gets imposted by spam bulk mailers, apparently they also
impersonate The Bat!. Strangely enough, I haven't seen them use the
default setting in the Advanced Mass Sender, which is 'Advanced Mass
Sender'. I wonder why...
I find it ironic that this probably makes The Bat! look much more popular and Outlook less so...
posted at: 12:40 | path: /tech | permanent link to this entry
All posts licensed under the CC-BY-NC license. Author Paul Wayper.