Too Busy For Words - The PaulWay Blog

Fri, 11 Apr 2014

Sitting at the feet of the Miller

Today I woke nearly an hour earlier than I'm used to, and got on a plane at a barely undignified hour, to travel for over three hours to visit a good friend of mine, Peter Miller, in Gosford.

Peter may be known to my readers, so I won't be otiose in describing him merely as a programmer with great experience who's worked in the Open Source community for decades. For the last couple of years he's been battling Leukaemia, a fight which has taken its toll - not only on him physically and on his work but also on his coding output. It's a telling point for all good coders to consider that he wrote tests on his good days - so that when he was feeling barely up to it but still wanted to do some coding he could write something that could be verified as correct.

I arrived while he was getting a blood transfusion at a local hospital, and we had spent a pleasurable hour talking about good coding practices, why people don't care about how things work any more, how fascinating things that work are (ever seen inside a triple lay-shaft synchronous mesh gearbox?), how to deal with frustration and bad times, how inventions often build on one another and analogies to the open source movement, and many other topics. Once done, we went back to his place where I cooked him some toasted sandwiches and we talked about fiction, the elements of a good mystery, what we do to plan for the future, how to fix the health care system (even though it's nowhere near as broken as, say, the USA), dealing with road accidents and fear, why you can never have too much bacon, what makes a good Linux Conference, and many other things.

Finally, we got around to talking about code. I wanted to ask him about a project I've talked about before - a new library for working with files that allows the application to insert, overwrite, and delete any amount of data anywhere in the file without having to read the entire file into memory, massage it, and write it back out again. Happily for me this turned out to be something that Peter had also given thought to, apropos of talking with Andrew Cowie about text editors (which was one of my many applications for such a system). He'd also independently worked out that such a system would also allow a fairly neat and comprehensive undo and versioning system, which was something I thought would be possible - although we differed on the implementation details, I felt like I was on the right track.

We discussed how such a system would minimise on-disk reads and writes, how it could offer transparent, randomly seekable, per-block compression, how to recover from partial file corruption, and what kind of API it should offer. Then Peter's son arrived and we talked a bit about his recently completed psychology degree, why psychologists are treated the same way that scientists and programmers are at parties (i.e. like a form of social death), and how useful it is to consider human beings as individual when trying to help them. Then it was time for my train back to Sydney and on to Canberra and home.

Computing is famous, or denigrated, as an industry full of introverts, who would rather hack on code than interact with humans. Yet many of us are extroverts who don't really enjoy this mould we are forced into. We want to talk with other people - especially about code! For an extrovert like myself, having a chance to spend time with someone knowledgeable, funny, human, and sympathetic is to see sun again after long days of rain. I'm fired up to continue work on something that I thought was only an idle, personal fantasy unwanted by others.

I can only hope it means as much to Peter as it does to me.

posted at: 17:50 | path: /tech | permanent link to this entry

Wed, 15 Jan 2014

Ignorable compression

On the way home from LCA, and on a whim, in Perth I started adding support for LZO compression to Cfile.

This turned out to have unexpected complications: while liblzo supports the wide variety of compression methods all grouped together as "LZO", it does not actually created '.lzo' files. This is because '.lzo' files also have a special header, added checksums, and file contents lists a bit like a tar file. All of this is added within the 'lzop' program - there is no external library for reading or writing lzo files in the same way that zlib handles gz files.

Now, I see three options here:

Yeah, I'm going for option one there.

LZO is a special case: it does a reasonable job of compression - not quite as much as standard gzip - but its memory requirements for compression can be miniscule and its decompression speed is very fast. It might work well for compression inside the file system, and is commonly used in consoles and embedded computers when reading compressed data. But for most common situations, even on mobile phones, I imagine gzip is still reasonably quick and produces smaller compressed output.

Now to put all the LZO work in a separate git branch and leave it as a warning to others.

posted at: 22:04 | path: /tech/c | permanent link to this entry

Fri, 01 Nov 2013

Converting cordless drill batteries

We have an old and faithful Ryobi 12V cordless drill which is still going strong. Unfortunately, the two batteries it came with have been basically killed over time by the fairly basic charger it comes with. I bought a new battery some time ago at Battery World, but they now don't stock them and they cost $70 or so anyway. And even with a small box from Jaycar connected to the charger to make sure it doesn't cook the battery too much, I still don't want to buy another Nickel Metal Hydride battery when all the modern drills are using Lithium Ion batteries.

Well, as luck would have it I recently bought several LiIon batteries at a good price, and thought I might as well have the working drill with a nice, working battery pack too. And I'd bought a nice Lithium Ion battery balancer/charger, so I can make sure the battery lasts a lot longer than the old one. So I made the new battery fit in the old pack:

First, I opened up the battery pack by undoing the screws in the base of the pack:

There were ten cells inside - NiMH and NiCd are 1.2V per cell, so that makes 12V. The pack contacts were attached to the top cell, which was sitting on its own plinth above the others. The cells were all connected by spot-welded tabs. I really don't care about the cells so I cut the tabs, but I kept the pack contacts as undamaged as possible. The white wires connect to a small temperature sensor, which is presumably used by the battery charger to work out when the battery is charged; the drill doesn't have a central contact there. You could remove it, since we're not going to use it, but there's no need to.

The new battery is going to sit 'forward' out of the case, I cut a hole for my replacement battery by marking the outline of the new pack against the side of the old case. I then used a small fretsaw to cut out the sides of the square, cutting through one of the old screw channels in the process.

I use "Tamiya" connectors, which are designed for relatively high DC current and provide good separation between both pins on both connectors. Jaycar sells them as 2-pin miniature Molex connectors; I support buying local. I started with the Tamiya charge cable for my battery charger and plugged the other connector shell into it. Then I could align the positive (red) and negative (black) cables and check the polarity against the charger. I then crimped and soldered the wires for the battery into the connector, so I had the battery connected to the charger. (My battery came with a Deanes connector, and the charger didn't have a Deanes connector cable, which is why I was putting a new connector on.)

Aside: if you have to change a battery's connector over, cut only one side first. Once that is safely sealed in its connector you can then do the other. Having two bare wires on a 14V 3AH battery capable of 25C (i.e. 75A) is a recipe for either welding something, killing the battery, or both. Be absolutely careful around these things - there is no off switch on them and accidents are expensive.

Then I repeated the same process for the pack contacts, starting by attaching a red wire to the positive contact, since the negative contact already had a black wire attached. The aim here is to make sure that the drill gets the right polarity from the battery, which itself has the right polarity and gender for the charger cable. I then cut two small slots in the top of the pack case to let the connector sit outside the case, with the retaining catch at the top. My first attempt put this underneath, and it was very difficult to undo the battery for recharging once it was plugged in.

The battery then plugs into the pack case, and the wires are just the right length to hold the battery in place.

Then the pack plugs into the drill as normal.

The one thing that had me worried with this conversion was the difference in voltages. Lithium ion cells can range from 3.2V to 4.2V and normally sit around 3.7V. The drill is designed for 12V; with four Lithium Ion cells in the battery, it ranges from 14.8V to 16.8V when fully charged. Would it damage the drill?

I tested it by connecting the battery to a separate set of thin wires, which I could then touch to the connector on the pack. I touched the battery to the pack, and no smoke escaped. I gingerly started the drill - it has a variable trigger for speed control - and it ran slowly with no smoke or other signs of obvious electric distress. I plugged the battery in and ran the drill - again, no problem. Finally, I put my largest bit in the drill, put a piece of hardwood in the vice, and went for it - the new battery handled it with ease. A cautious approach, perhaps, but it's always better to be safe than sorry.

So the result is that I now have a slightly ugly but much more powerful battery pack for the drill. It's also 3AH versus the 2AH of the original pack, so I get more life out of the pack. And I can swap the batteries over quite easily, and my charger can charge up to four batteries simultaneously, so I have something that will last a long time now.

I'm also writing this article for the ACT Woodcraft Guild, and I know that many of them will not want to buy a sophisticated remote control battery charger. Fortunately, there are many cheap four-cell all-in-one chargers at HobbyKing, such as their own 4S balance charger, or an iMAX 35W balance charger for under $10 that do the job well without lots of complicated options. These also run off the same 12V wall wart that runs the old pack charger.

Bringing new life to old devices is quite satisfying.

posted at: 08:41 | path: /tech | permanent link to this entry

Tue, 13 Aug 2013

New file system operations

Many many years ago I thought of the idea of having file operations that effectively allowed you to insert and delete, as well as overwrite, sections of a file. So if you needed to insert a paragraph in a document, you would simply seek to the byte in the file just before where you wanted to insert, and tell the file to insert the required number of bytes. The operating system would then be responsible for handling that, and it could then seamlessly reorganise the file to suit. Deleting a paragraph would be handled by similar means.

Now, I know this is tricky. Once you go smaller than the minimum allocation unit size, you have to do some fairly fancy handling in the file system, and that's not going to be easy unless your file system discards block allocation and goes with byte offsets. The pathological case of inserting one byte at the start of a file is almost certainly going to mean rewriting the entire file on any block-based file system. And I'm sure it offends some people, who would say that the operations we have on files at the moment are just fine and do everything one might efficiently need to do, and that this kind of chopping and changing is up to the application programmer to implement.

That, to me, has always seemed something of a cop-out. But I can see that having file operations that only work on some file systems is a limiting factor - adding specific file system support is usually done after the application works as is, rather than before. So there it sat.

Then a while ago, when I started writing this article, I found myself thinking of another set of operations that could work with the current crop of file systems. I was thinking specifically of the process that rsync has to do when it's updating a target file - it has to copy the existing file into a new, temporary file, add the bits from the source that are different, then remove the old file and substitute the new. In many cases we're simply appending new stuff to the end of the old file. It would be much quicker if rsync could simply copy the appended stuff into a new file, then tell the file system to truncate the old file at a specific byte offset (which would have to be rounded to an allocation unit size) and concatenate the two files in place.

This would be relatively easy for existing file systems to do - once the truncate is done the inodes or extents of the new file are simply copied into the table of the old file, and then the appended file is removed from the directory. It would be relatively quick. It would not take up much more space than the final file would. And there are several obvious uses - rsync, updating some types of archives - where you want to keep the existing file until you really know that it's going to be replaced.

And then I thought: what other types of operations are there that could use this kind of technique. Splitting a file into component parts? Removing a block or inserting a block - i.e. the block-wise alternative to my byte offset operations above? All those would be relatively easy - rewriting the inode or offset map isn't, as I understand it, too difficult. Even limited to operations that are easy to implement in the file system, there are considerably more operations possible than those we currently have to work with.

I have no idea how to start this. I suspect it's a kind of 'chicken and egg' problem - no-one implements new operations for file systems because there are no clients needing them, and no-one clients use these operations because the file systems don't provide them. Worse, I suspect that there are probably several systems that do weird and wonderful tricks of their own - like allocating a large chunk of file as a contiguous extent of disk and then running their own block allocator on top of it.

Yes, it's not POSIX compliant. But it could easily be a new standard - something better.

posted at: 19:53 | path: /tech/ideas | permanent link to this entry

Thu, 01 Aug 2013

Preventing patent obscurity

One of the problems I see with the patent system is that patents are often written in obscure language, using unusual and non-standard jargon, so as to both apply as broadly as possible and not show up as "obvious" inventions.

So imagine I'm going to try to use a particular technology, or I'm going to patent a new invention. As part of my due diligence, I have to provide a certified document that shows what search terms I used to search for patents, and why any patents I found were inapplicable to my use. Then, when a patent troll comes along and says "you're using our patent", my defence is, "Sorry, but your patent did not appear relevant in our searches (documentation attached)."

If my searches are considered reasonable by the court, then I've proved I've done due diligence and the patent troll's patent is unreasonably hard to find. OTOH, if my searches were unreasonable I've shown that I have deliberately looked for the wrong thing in the hopes that I can get away with patent infringement, so damages would increase. If I have no filing of what searches I did, then I've walked into the field ignorant and the question then turns on whether I can be shown to have infringed the patent or whether it's not applicable, but I can be judged as not taking the patent system seriously.

The patent applicant should be the one responsible for writing the patent in the clearest, most useful language possible. If not, why not use Chinese? Arpy-Darpy? Ganster Jive? Why not make up terms: "we define a 'fnibjaw' to be a sequence of bits at least eight bits long and in multiples of eight bits"? Why not define operations in big-endian notation where the actual use is in little-endian notation, so that your constants are expressed differently and your mathematical operations look nothing like the actual ones performed but your patent is still relevant? The language of patents is already obscure enough, and even if you did want to actually use a patent it is already hard enough with some patents to translate their language into the standard terms of art. Patent trolls rely on their patents being deliberately obscure so that lawyers and judges have to interpret them, rather than technical experts.

The other thing this does is to promote actual patent searches and potential usage. If, as patent proponents say, the patent system is there to promote actual use and license of patents before a product is implemented, then they should welcome something that encourages users to search and potentially license existing patents. The current system encourages people to actively ignore the patent system, because unknowing infringement is seen as much less of an offence than knowing infringement - and therefore any evidence of actually searching the patent system is seen as proof of knowing infringement. Designing a system so that people don't use it doesn't say a lot about the system...

This could be phased in - make it apply to all new patents, and give a grace period where searches are encouraged but not required to be filed. Make it also apply so that any existing patent that is used in a patent suit can be queried by the defendent as "too obscure" or "not using the terms of art", and require the patent owner to rewrite them to the satisfaction of the court. That way a gradual clean-up of the current mess of incomprehensible patents that that have deliberately been obfuscated can occur.

If the people who say patents are a necessary and useful thing are really serious in their intent, then they should welcome any effort to make more people actually use the patent system rather than try to avoid it.

Personally I'm against patents. Every justification of patents appeals to the myth of the "home inventor", but they're clearly not the beneficiaries of the current system as is. The truth is that far from it being necessary to encourage people to invent, you can't stop people inventing! They'll do it regardless of whether they're sitting on billion-dollar ideas or just a better left-handed cheese grater. They're inventing and improving and thinking of new ideas all the time. And there are plenty of examples of patents not stopping infringement, and plenty of examples of companies with lots of money just steamrollering the "home inventor" regardless of the validity of their patents. Most of the "poster children" for the "home inventor" myth are now running patent troll companies. Nothing in the patent system is necessary for people to invent, and its actual objectives do not meet with the current reality.

I love watching companies like Microsoft and Apple get hit with patent lawsuits, especially by patent trolls, because they have to sit there with a stupid grin on their face and still admit that the system that is screwing billions of dollars in damages out of them is the one they also support because of their belief that patents actually have value.

So introducing some actual utility into the patent system should be a good thing, yeah?

posted at: 12:19 | path: /tech/ideas | permanent link to this entry

Tue, 14 May 2013

Modern kernels and uncooperative monitors

Our main TV screen is a Kogan 32" TV hooked up to a Mini-ITX machine running a MythTV frontend on Fedora 18. Due to Kogan buying the cheapest monitors, which are the ones with the worst firmware, it has several annoyingly braindead features that make it hard to use with a computer:

Now, not having an EDID used not to be a problem when X did most of the heavy work of setting up the display, because you could, at a pinch, tell it to trust you on what modes the monitor could support. With a program like cvt you could generate a modeline that you'd stick in your /etc/X11/xorg.conf and it'd output the right frequencies. This is what I had to do for Fedora 16.

The new paradigm now is that the kernel sets the monitor resolution and X is basically a client application to use it. This solves a lot of problems for most people, but unfortunately the kernel doesn't really handle the situation when the monitor doesn't actually respond with a valid EDID. More unfortunately, this actually happens in numerous situations - dodgy monitors and dodgy KVM switches being two obvious ones.

It turns out, however, that there is a workaround. You can tell the kernel that you have a (made-up) EDID block to load that it's going to pretend came from the monitor. To do this, you have to generate an EDID block - handily explained in the Kernel documentation - which requires grabbing the kernel source code and Making the files in the Documentation/EDID directory. Then put the required file, say 1920x1080.bin, in a new directory /lib/firmware/edid, and add the parameter "drm_kms_helper.edid_firmware=edid/1920x1080.bin" to your kernel boot line in GRUB, and away you go.

Well, nearly. Because the monitor literally does not respond, rather than responding with something useless, the kernel doesn't turn that display on (because, after all, not responding is also what the HDMI and DVI ports are also doing, because nothing is plugged into them). So you also have to tell the kernel that you really do have a monitor there, by also including the parameter "video=VGA-1:e" on the kernel boot line as well.

Once you've done that, you're good to go. Thank you to the people at OSADL for documenting this. Domestic harmony at PaulWay Central is now restored.

posted at: 21:11 | path: /tech | permanent link to this entry

Sat, 23 Mar 2013

Recording video at LCA

A couple of people have asked me about the process of recording the talks at Linux Conference Australia, and it's worth publishing something about it so more people get a better idea of what goes on.

The basic process of recording each talk involves recording a video camera, a number of microphones, the video (and possibly audio) of the speaker's laptop, and possibly other video and audio sources. For keynotes we recorded three different cameras plus the speaker's laptop video. In 2013 in the Manning Clark theatres we were able to tie into ANU's own video projection system, which mixed together the audio from the speaker's lapel microphone, the wireless microphone and the lectern microphone, and the video from the speaker's laptop and the document scanner. Llewellyn Hall provided a mixed feed of the audio in the room.

Immediately the problems are: how do you digitise all these things, how do you get them together into one recording system, and how do you produce a final recording of all of these things together? The answer to this at present is DVswitch, a program which takes one or more audio and video feeds and acts as a live mixing console. The sources can be local to the machine or available on other machines on the network, and the DVswitch program itself acts as a source that can then be saved to disk or mixed elsewhere. DVswitch also allows some effects such as picture-in-picture and fades between sources. The aim is for the room editor to start the recording before the start of the talk and cut each recording after the talk finishes so that each file ends up containing an entire talk. It's always better to record too much and cut it out later rather than stop recording just before the applause or questions. The file path gives the room and time and date of recording.

The current system then feeds these final per-room recordings into a system called Veyepar. It uses the programme of the conference to match the time, date and room of each recording with the talk being given in the room at that time. A fairly simple editing system then allows multiple people to 'mark up' the video - choosing which recorded files form part of the talk, and optionally setting the start and/or end times of each segment (so that the video starts at the speaker's introduction, not at the minute of setup beforehand).

When ready, the talk is marked for encoding in Veyepar and a script then runs the necessar programs to assemble the talk title and credits and the files that form the entire video into one single entity and produce the desired output files. These are stored on the main server and uploaded via rsync to mirror.linux.org.au and are then mirrored or downloaded from there. Veyepar can also email the speakers, tweet the completion of video files, and do other things to announce their existence to the world.

There are a couple of hurdles in this process. Firstly, DVswitch only deals with raw DV files recorded via Firewire. These consume about a gigabyte per hour of video, per room - the whole of LCA's raw recorded video for a week comes to about 2.2 terabytes. These are recorded to the hard drive of the master machine in each room; from there they have to be rsync'ed to the main video server before any actual mark-up and processing in Veyepar can begin. It also means that previews must be generated of each raw file before it can be watched normally in Veyepar, a further slow-down to the process of speedily delivering raw video. We tried using a file sink on the main video server that talked to the master laptop's DVswitch program and saved its recordings directly onto the disk in real time, but despite having tested this process in November 2012 and it working perfectly, during the conference it tended to produce a new file each second or three even when the master laptop was recording single, hour-long files.

Most people these days are wary of "yak shaving" - starting a series of dependent side-tasks that become increasingly irrelevant to solving the main problem. We're also wary of spending a lot of time doing something by hand that can or should be automated. In any large endeavour it is important to strike a balance between these two behaviours - one must work out when to stop work and improve the system as a whole, and when to keep using the system as is because improving it would take too long or risk breaking things irrevocably. I fear in running the AV system at LCA I have tended toward the latter too much - partly because of the desire within the team (and myself) to make sure we got video from the conference at all, and partly because I sometimes prefer a known irritation to the unknown.

The other major hurdle is that Veyepar is not inherently set up for distributed processing. In order to have a second Veyepar machine processing video, one must duplicate the entire Veyepar environment (which is written in Django) and point both at the same database on the main server. Due to a variety of complications, this was not possible without stopping Veyepar and possibly having to rebuild its database from scratch, and I and the team lacked the experience with Veyepar to know how to easily set it up in this configuration. I didn't want to start to set up Veyepar on other machines and finding myself shaving a yak and looking for a piece of glass to mount a piece of 1000-grit wet and dry sandpaper on to sharpen the razor correctly.

Instead, I wrote a separate system that produced batch files in a 'todo' directory. A script running on each 'slave' encoding machine periodically checked this directory for new scripts; when it found one it would move it to a 'wip' directory, run it, and move it and its dependent file into a 'done' directory when finished. If the processes in the script failed it would be moved into a 'failed' directory and could be resumed manually without having to be regenerated. A separate script (already supplied in Veyepar and modified by me) periodically checked Veyepar for talks that were set to "encode", wrote their encode script and set them to "review". Thus, as each talk was marked up and saved as ready to encode, it would automatically be fed into the pipeline. If a slave saw multiple scripts it would try to execute them all, but would check that each script file existed before trying to execute it in case another encoding machine had got to it first.

That system took me about a week of gradual improvements to refine. It also took me giving a talk at the CLUG programming SIG on parallelising work (and the tricks thereof) to realise that instead of each machine trying to allocate work to itself in parallel, it was much more efficient to make each slave script do one thing at a time and then run multiple slave scripts on each encoder to get more parallel processing, thus avoiding the explicit communication of a single work queue per machine. It relies on NFS correctly handling the timing of a file move so that one slave script cannot execute the script another has already moved into work in progress, but that at this granularity of work is a very small time of overlap.

I admit that, really, I was unprepared for just how much could go wrong with the gear during the conference. I had actually prepared; I had used the same system to record a number of CLUG talks in months leading up to the conference; I'd used the system by myself at home; I'd set it up with others in the team and tested it out for a weekend; I've used similar recording equipment for many years. What I wasn't prepared for was that things that I'd previously tested and had found to work perfectly would break in unexpected ways:

The other main problem that galls me is that there are inconsistencies in the recordings that I could have fixed if I'd been aware of them at the time. Some rooms are very loud, others quite soft. Some rooms cut the recording at the start of the applause, so I had to join the next segment of recording on and cut it early to include the applause that the speaker deserved. There were a few recordings that we missed entirely for reasons I don't know. I was busy trying to sort out all the problems with the main server and I was immensely proud of and thankful for the team of Matt Franklin, Tomas Miljenovic, Leon Wright, Euan De Koch, Luke John and Jason Nicholls who got there early, left late, worked tirelessly, and leapt - literally - up to fix a problem when it was reported. Even with a time machine some of those problems would never be fixed - I consider it both rude and amateur to interrupt a speaker to tell them that we them to start again due to some glitch in the recording process.

But the main lesson to me is that you can only practice setting it up, using it, packing it up and trying again with something different in order to find out all the problems and know how to avoid them. The 2014 team were there in the AV room and they'll know all of what we faced, but they may still find their own unique problems that arise as a result of their location and technology.

There's a lot of interest and effort being put in to improve what we have. Tim Ansell has started producing gstswitch, a Gstreamer-based program similar to DVswitch which can cope with modern, high-definition, compressed media. There's a lot of interest in the LCA 2014 team and in other people to produce a better video system that is better suited to distributed processing, distributed storage and cloud computing. I'm hoping to be involved in this process but my time is already split between many different priorities and I don't have the raw knowledge of the technologies to be able to easily lead or contribute greatly such a process. All I can do is to contribute my knowledge of how this particular LCA worked, and what I would improve.

posted at: 09:23 | path: /tech/lca | permanent link to this entry

Tue, 05 Mar 2013

Code on the beach!

In 2011 I ran an event called CodeCave, which saw nine intrepid coders and three intrepid family go to Yarrangobilly Caves to spend a cool, wet winter weekend coding, eating, exploring in caves, coding, playing Werewolf, taking photos, coding, swimming (!), talking, flying planes and helicopters, and coding. Being an extrovert, I love those opportunities to see friends doing cool things with code, and my impression is everyone enjoyed the weekend.

I had a hiatus in 2012 for various reasons, but this year I've decided to run another similar event. But, as lovely as Yarrangobilly is and as comfortable as the Caves House was to stay in, it's a fair old five hour drive for people in Sydney, and even Canberrans have to spend the best part of two hours driving to get there. And Peter Miller, who runs the fabulous CodeCon (on which CodeCave was styled) every year, is going to be a lot better off near his health care and preferred hospital. Where to have such an event, then?

One idea that I'd toyed with was the Pittwater YHA: close to Sydney (where many of the attendees of CodeCave and CodeCon come from), still within a reasonable driving distance from Canberra (from where much of the remainder of the attendees hail), and close to Peter's base in Gosford. But there's no road up to it, you literally have to catch the ferry and walk 15 minutes to get there - while this suits the internet-free aesthetic of previous events, for Peter it's probably less practical. I discussed it on Google+ a couple of weeks ago without a firm, obvious answer (Peter is, obviously, reserving his say until he knows what his health will be like, which will probably be somewhere about two to three weeks out I imagine :-) ).

And then Tridge calls me up and says "as it happens, my family has a house up on the Pittwater". To me it sounds brilliant - a house all to ourselves, with several bedrooms, a good kitchen, and best of all on the roads and transport side of the bay; close to local shops, close to public transport, and still within a reasonable drive via ambulance to Gosford Hospital (or, who knows, a helicopter). Tridge was enthusiastic, I was overjoyed, and after a week or so to reify some of my calendar that far out, I picked from Friday 26th July to Sunday 28th July 2013.

So it's now called CodeBeach 2013, and it also has a snazzy Google Form to take bookings on. Please drop me an email if you've got any questions. We'd love to have you there!

posted at: 21:13 | path: /tech | permanent link to this entry

Sat, 18 Aug 2012

The Library That Should Be

In my current job, I have to look at PHP. Often, I have to run command-line programs written in PHP. All of these programs have a typically PHP approach to command line processing - in other words, it's often a hack, it's done without any great consistency, and you have to do a lot of the hard work yourself. There are at least three command-line processing libraries in PHP, but I longed for Perl's wonderful Getopt::Long module because it improved on them in several important ways:

The main thing we want to eliminate by using modules is 'boilerplate', and the current offerings for command-line processing in PHP still require lots of extra code to process their results. So, because the current offerings were insufficient, I decided to write my own. The result is:

Console_GetoptLong

Along the way I added a couple of things. For a start, Console_GetoptLong recognises --option=value arguments, as well as -ovalue where 'o' is a single letter option and doesn't already match a synonym. It also allows combining single-letter options, like tar -tvfz instead of tar -t -v -f -z (and you've specified that it should do that - this is off by default). It gives you several ways of handling something starting with a dash that isn't a defined synonym - warn, die, ignore, or add it to the unprocessed arguments list.

One recent feature which hopefully will also reduce the amount of boilerplate code is what I call 'ordered unflagged' options. These are parameters that aren't signified by an option but by their position in the argument list. We use commands like this every day - mv and cp are examples. By specifying that '_1' is a synonym for an option, Console_GetoptLong will automatically pick the first remaining argument off the processed list and, if that parameter isn't already set, it will make that first argument the value of that parameter. So you can have a command that takes both '-i input_file' and 'input_file' style arguments, in the one parameter definition.

Another way of hopefully reducing the amount of boilerplate is that it can automatically generate your help listing for you. The details are superfluous to this post, but the other convenience here is that your help text and your synonyms for the parameter are all kept in one place, which makes sure that if you add a new option it's fairly obvious how to add help text to it.

As always, I welcome any feedback on this. Patches are even better, of course, but suggestions, bug reports, or critiques are also gladly accepted.

posted at: 18:25 | path: /tech | permanent link to this entry

Sat, 23 Jun 2012

Forgotten projects

MythTV has recently updated to version 0.25. That has meant a small but important change to the parameters necessary for updating guide data. Chris Yeoh was ahead of the game and, knowing I used it, sent me a patch for the tv_grab_oztivo script. He noted that he'd tried to get it from the last known good source, and it wasn't answering.

Well, it sort of is. The normal URL doesn't work but Google reveals http://web.aanet.com.au/auric/files2/tv_grab_oztivo. Interestingly, its version number is still at the recognised place - 1.36 - but all other parts of the site seem to be having problems with its database. And since it hasn't been updated since this time in 2010, I think there's a good possibility it may remain unchanged from now on.

A number of years ago I offered to host the script on my home Subversion repository, but got no response. So I've blown the dust off, updated it, added Chris's patch, and it's now up to date at http://tangram.dnsalias.net/repos/tv_grab_oztivo/trunk/tv_grab_oztivo. Please feel free to check that out and send me patches if there are other improvements to make to it.

posted at: 20:03 | path: /tech | permanent link to this entry

Thu, 07 Jun 2012

Swapping Shackles

Charles Stross talks here about why book publishers are afraid of Amazon and that the publishers have given Amazon control over them by insisting on DRM. The problem I see with this analysis is that, actually, the publishers have another option: publish their own 'free' app that can read their own DRM. Cut Amazon out of the equation by selling direct to the readers. There may be contractual reasons why the Big Six can't set up a web store to compete directly with Amazon, but I'm sure that's a matter that their lawyers could sort out. There might be a possible legal reason - I don't study this field and Charlie does, so he might correct me there, but I don't see anything in his comments on it and a few people suggest it.

The cited reason that the Big Six don't sell their own books directly seems to be that they just haven't set up their websites. Bad news for Amazon: that's easy with the budgets the big publishers have - Baen already do sell their own ebooks, for example (without DRM, too). More bad news for Amazon: generating more sales by referrals (the "other readers also bought" stuff) isn't a matter of customers or catalogue, it's just a matter of data. Start selling books and you've got that kind of referral. Each publisher has reams of back catalogue begging to be digitised and sold. They've got the catalogue, they've got the direct access to the readers, they've got the money to set up the web sites, and they've now got the motivation to avoid Amazon and sell direct to the reader. That to me spells disaster for Amazon.

But it also means disaster for us. Because you're going to have multiple different publisher's proprietary e-book reader - the only one they'll bless with their own DRM. Each one will have its own little annoyances, peccadilloes and bugs. Some won't let you search. Some won't let you bookmark. Some will make navigation difficult. Some won't remember where you were up to in one book if you open up another. Others might lock up your reader, have back doors into your system, use ugly fonts, be slow, have no 'night' mode, or might invasively scan your device for other free books and move them into their own locked-down storage. And you won't be able to change, because none of your books will work in any other reader than the publisher's own. After all, why would they give another app writer access to their DRM if it means the reader might then go to a different publisher and buy books elsewhere?

We already have this situation. I have to use the Angus & Robertson reader (created by Kobo) for reading some of my eBooks. It doesn't allow me to bookmark places in the text, its library view has one mode (and it's icons, not titles), I can't search for text, and its page view is per chapter (e.g. '24 of 229') not through the entire book. In those ways and more it's inferior to the free FBReader that I read the rest of my books in - mostly from Project Gutenberg - but I have no choice; the only way to get the books from the store is through the app. These are books I paid money for and I'm restricted by what the software company that works for the publishing broker contracted by the retailer wants to implement. This is not a good thing.

What can we, the general public, do about this? Nothing, basically. Write to your government and they'll nod politely, file your name in the "wants to hear more about the arts" mailing list, and not be able to do a thing. Write to a publisher and they'll nod vacantly, file your name in the wastepaper bin, and get back to thinking how they can make more profit. Write to your favourite author and they'll nod politely, wring their hands, say something about how it's out of their control what their editor's manager's manager's manager decides, and be unable to do anything about it. Everyone else is out of the picture.

Occasionally someone suggests that Authors could just deal directly with the readers directly. At this point, everyone else sneers - even fanfic writers look down on self-publishers. And, sadly, they're right - because (as Charlie points out) we do actually need editors, copy-readers and proofers to turn the mass of words an author emits into a really compelling story. (I personally can't imagine Charlie writing bad prose or forgetting a character's name, but I can imagine an editor saying "hey, if you replaced that minor character with this other less minor character in this reference, it'd make the story more interesting", and it's these things that are what we often really enjoy about a story.) I've written fiction, and I've had what I thought was elegantly clear writing shown to be the confusing mess of conflicting ideas and rubbish imagery that it was. Editors are needed in this equation, and by extension publishers, imprints, marketers, cover designers, etc.

Likewise, instead of running your own site, why not get a couple of authors together and share the costs of running a site? Then you get something like Smashwords or any of the other indie book publishers - and then you get common design standards, the requirement to not have a conflicting title with another book on the same site, etc. So either way you're going to end up with publishers. And small publishers tend to get bought up by larger publishers, and so forth; capitalism tends to produce this kind of structure to organisations.

So as far as I can see, it's going to get worse, and then it's going to get even worse than that. I don't think Amazon will win - if nothing else, because they're already looking suspiciously like a monopolist to the US Government (it's just that the publishers and Apple were stupid enough to look like they were being greedier than Amazon). But either way, the people that will control your reading experience have no interest in sharing with anyone else, no interest in giving you free access to the book you've paid to read (and no reason if they can give you a license, call it a book, charge what a book costs, and then screw you later on), and everyone else has no control over what they're going to do with an ebook in the future. If the publisher wants to revoke it, rewrite it, charge you again for it, stop you re-reading it, disallow you reading previous pages, only read it in the publisher's colours of lime green on pink, or whatever, we have absolutely no way of stopping this. The vast majority of people are already happy to shackle themselves to Amazon, to lock themselves into Apple, and tell themselves they're doing just fine.

Sorry to be cynical about this, but I think this is going to be one of those situations where the disruptive technologies just come too little and too late. Even J. L. Rowling putting her books online DRM-free isn't going to change things - most of the commentators I've read just point to this and say "oh well, the rest of us aren't that powerful, we'll just have to co-operate with (Amazon|the publisher I'm already dealing with)". Even the ray of hope that Cory Doctorow offers with his piece on Digital Lysenkoism - that the Humble E-Book Bundle has authors wanting to get their publishers off DRM because there's a new smash-hit to be had with the Humble Bundle phenomenon - is a drop of nectar in the ocean of tears; no publisher's really going to care about the Humble Bundle success if it means facing down the bogey-man of unfettered public copying of ebooks that they themselves have been telling everyone for the last twenty years.

So publishers are definitely worrying about Amazon's monopsony. But the idea that that will cause them to give up DRM is wishful thinking. They've got too much commitment to preventing people copying their books, they don't have to give up DRM in order to cut Amazon out of the deal, and if DRM then locks readers into a reliance on the publishers it's a three-way win for them. And a total lose for us, but then capitalism has never been about giving the customer what they want.

posted at: 22:41 | path: /tech | permanent link to this entry

Wed, 01 Feb 2012

Going from zero

A friend of mine and I were discussing cars the other day. He said that he thought the invention of the electric motor was a curse on cars, because it meant you wouldn't have a gearbox to control which gear you were in. A suitable electric motor has enough power to drive the car from zero to a comfortable top speed (110km/hr) at a reasonable acceleration using a fixed gear ratio - the car stays in (in this case third) gear and you drive it around like that. He maintained, however, that you needed to know which gear you were in, and to change gears, because otherwise you could find yourself using a gear that you hadn't chosen.

I argued that, in fact, having to select a gear meant that drivers both new and experienced would occasionally miss a gear change and put the gearbox into neutral by mistake, causing grinding of gears and possible crashes as the car was now out of control. He claimed to have heard of a clever device that would sit over your gearbox and tell you when you weren't in gear, but you couldn't use the car like that all the time because it made the car too slow. So you tested the car with this gearbox-watcher, then once you knew that the car itself wouldn't normally miss a gear you just had to blame the driver if the car blew up, crashed, or had other problems. But he was absolutely consistent in attitude towards electric motors: you lost any chance to find out that you weren't in the right gear, and therefore the whole invention could be written off as basically misguided.

Now, clever readers will have worked out that at this point my conversation was not real, and was in fact by way of an analogy (from the strain on the examples, for one). The friend was real - Rusty Russell - but instead of electric motors we were discussing the Go programming language and instead of gearboxes we were discussing the state of variables.

In Go, all variables are defined as containing zero unless initialised otherwise. In C, a variable can be declared but undefined - the language standard AFAIK does not specify the state of a variable that is declared but not initialised. From the C perspective, there are several reasons you might not want to automatically pre-initialise a variable when you define it - it's about to be set from some other structure, for example - and pre-initialising it is a waste of time. And being able to detect when a variable has been used without knowing what its stage is - using valgrind, for example - means you can detect subtle programming errors that can have hard-to-find consequences when the variable's meaning or initialisation is changed later on. If you can't know whether the programmer is using zero because that's what they really wanted or because it just happened to be the default and they didn't think about it, then how do you know which usage is correct?

From the Go perspective, in my opinion, these arguments are a kludgy way of seeing a bug as a feature. Optimising compilers can easily detect when a variable will be set twice without any intervening examination of state, and simply remove the first initialisation - so the 'waste of time' argument is a non-issue. Likewise, any self-respecting static analysis tool can determine if a variable is tested before it's explicitly defined, and I can think of a couple of heuristics for determining when this usage isn't intended.

And one of the most common errors in C is use of undefined variables; this happens to new and experienced programmers alike, and those subtle programming problems happen far more often in real-world code as it evolves over time - it is still rare for people to run valgrind over their code every time before they commit it to the project. It's far more useful to eliminate this entire category of bugs once and for all. As far as I can see, you lose nothing and you gain a lot more security.

To me, the arguments against a default value are a kind of lesser Stockholm Syndrome. C programmers learn from long experience to do things the 'right way', including making sure you initialise your variables explicitly before you use them, because of all the bugs - from brutally obvious to deviously subtle - that are caused by doing things in any other way. Tools like valgrind work around indirectly fixing this problem after the fact. People even come to love them - like the people who love being deafened by the sound of growling, blaring petrol engines and associate the feeling of power with that cacophany. They mock those new silent electric motors because they don't have the same warts and the same pain-inducing behaviour as the old petrol engine.

I'm sure C has many good things to recommend it. But I don't think lack of default initialisation is one.

posted at: 11:00 | path: /tech | permanent link to this entry

Critical Thinking

In the inevitable rant-fest that followed the LWN story on the proposal to have /lib and /bin point to /usr/lib and /usr/bin respectively (short story), I observe with wry amusement the vocal people who say "Look at PulseAudio - it's awful, I have to fight against all the time, that's why we shouldn't do this". The strange, sad thing about these people is that they happily ignore all those people (like me) for whom PulseAudio just works. There's some little concieted part of their brain that says "I must be the only person that's right and everyone else has got it wrong." It's childish, really.

And in my experience, those people often make unrealistic demands on new software, or misuse it - consciously or unconsciously, and with or without learning about it. These people are semi-consciously determined to prove that the new thing is wrong, and everything they do then becomes in some way critical of it. Any success is overlooked as "because I knew what to do", every failure is pounced on as proof that "the thing doesn't work". I've seen this with new hardware, new software, new cars, new clothes, new houses, accommodation, etc. You can see it in the fact that there's almost no correlation between people who complain about wind generator noise and the actual noise levels measured at their property. Human beings all have a natural inclination to believe that they are right and everything else is wrong, and some of us fight past that to be rational and fair.

This is why I didn't get Rusty's post on the topic. It's either completely and brilliantly ironic, or (frankly) misguided. His good reasons are all factual; his 'bad' reasons are all ad-hominem attacks on a person. I'd understand if it was e.g. Microsoft he was criticising - e.g. "I don't trust Microsoft submitting a driver to the kernel; OT1H it's OK code, OTOH it's Microsoft and I don't trust their motives" - because Microsoft has proven so often that their larger motives are anti-competition even if their individual engineers and programmers mean well. But dmesg, PulseAudio, and systemd have all been (IMO) well thought out solutions to clearly defined problems. systemd, for example, succeeds because it uses methods that are simple, already in use and solve the problem naturally. PulseAudio does not pretend to solve the same problems as JACK. I agree that Lennart can be irritating some times, but I read an article once by someone clever that pointed out that you don't have to like the person in order to use their code...

posted at: 10:16 | path: /tech | permanent link to this entry

Mon, 19 Dec 2011

PHP Getopt::Long

In my current work I have to occasionally work with PHP code. I don't really like PHP, for a variety of otiose reasons. But one of the things that surprised me was that it didn't have an equivalent to Perl's 'Getopt::Long' module. There are a couple of other modules that are in PHP's PEAR package repository which attempt to handle more than PHP's built-in getopt function, but all of these lack a couple of fundamental features:

  1. I want to be able to pass a single description - e.g. 'verbose|v' - and have the function recognise both as synonyms for the same setting.
  2. I want to be able to pass a variable reference and have that updated directly if the associated command line parameter is supplied.
  3. I want to have it remove all the processed arguments off the command line so that all that is left is the array of things that weren't parameters or their arguments.
  4. I want a single, single call, rather than calling object methods for each separate parameter.
(To be clear: some of the PEAR modules provide some of these. But all of them lack goal 2, most lack goal 3, and while are able to achieve goal 1 it's only by lots of extra code or option specification.)

So I wrote one.

The result is available from my nascent PHP Subversion library at:

http://tangram.dnsalias.net/repos/PWphp/getopt_long/.

It's released under version 3 of the GPL. It also comes with a simple test framework (written, naturally, in a clearly superior language: Perl).

This is still a work in progress, and there are a number of features I want to add to it - chief amongst them packaging it for use in PEAR. I'm not a PHP hacker, and it still astonishes me that PHP programmers have been content to use the mish-mash of different half-concocted options for command line processing when something clearly better exists - and that many of the PHP programs I have to work with don't use any of those but write their own minimal, failure-prone and ugly command line processing from scratch.

I'd love to hear from people with patches, suggestions or comments. If you want write access to the repository, let me know as well.

posted at: 13:19 | path: /tech | permanent link to this entry


All posts licensed under the CC-BY-NC license. Author Paul Wayper.

You can also read this blog as a syndicated RSS feed.