Jumping into a new project
(Nearly all of this was written after the PSIG meeting on the 9th of
November; then I got too busy and didn't finish it off. So "tonight" is
two weeks ago as of this posting.)
Tonight at the Programmer's SIG we were 'supposed' to be having a sort of round-table discussion, with people with ideas meeting up with people who know how to implement them. Or, at least, have more knowledge into the way that Linux is organised and may be able to recommend language choices, libraries to look for and people to speak to. If any of those people had actually turned up, this would have happened. But they didn't.
After the usual early round of "Hey have you seen this cool stuff / weird shit" as meals were served (amazingly quickly, this time), I tried to jump start the thing by asking what people's ideas were. Maybe it's just me - this didn't seem to get any real discussion started. Conversation kept revolving around Pascal Klein's idea for rewriting the Linux kernel in C#, and the multivarious reasons why this would be a Bad Thing. As amusing as it is to discuss bad language choices, the things we hate about customers, and what's new on Slashdot, this wasn't really doing it for me as someone who a) has ideas and b) is a programmer.
Despite the good nature of Steve Walsh's teasing, I do worry that I'm talking too much about my own ideas. I say this because we then had a long and quite spirited discussion about how to solve a problem with my backup process. It started with me noting that I'd thought of an improvement to rsync:
At the moment, rsync will only try to synchronise changes to a file if the destination directory has a file with that name. If you've renamed the file, or copied it into a new directory, then rsync (AFAIK) won't recognise that and will copy the entire file again. However, rsync already has a mechanism to recognise which files are the same - it generates a checksum for each file it encounters and only copies the checksums if the file is different. So the idea is for the receiver to check if it already has a file with that checksum somewhere else. There's more to it than this, but I'll develop that in another post.
This all supports my partner's method of backing up her PhD - every once in a while, she takes all the files so far and copies them into a directory named 'Backup date'. Separately to this, I then rsync her entire directory up to my brother's machine in Brisbane, as an off-site backup. While I'm not especially worried about the time it takes or the amount of data transferred, since rsync's principle aim is to reduce both of these I thought it would be a useful improvement to optimise for the case where a file has been renamed on the client - why transmit the whole file again if you can just copy and delete on the server?
I suppose the thing I enjoyed was the idea of co-operatively solving a problem using the tools at everyone's disposal. Several people suggested that Revision Control Systems would be better in this scenario, because they would only store the diffs and would give instant reversion to any point in time. Other people suggested automated folders that would pick up the files in a 'drop' directory, put them in an appropriately labelled directory, and then start a remote copy of the appropriate folder on the remote server. Other people suggested that having two backups was overkill - that as long as I had the remote server updated I could retrieve backup copies should anything go wrong locally. All of these were good suggestions, and despite the problem that they didn't really solve the problem the way I wanted it to be solved, I did really appreciate the new ideas and approaches.
That led me to my next question, which was: rsync is a largish and complicated piece of software. The philosophy of Open Source says that if you have an idea, you should modify the source rather than ask someone else to do it; and I can program in C so the source of rsync wouldn't be foreign to me. So where do I start? One approach suggested was to generate a tags file and start tracing through the execution of the main routine; another was to find the printed text messages that are generated at the time that I want my revision to be used, and start reading from there. A further approach was to draw a concept map - sketch out the top-down design of rsync in order to narrow down the code I had to read. All excellent suggestions, and when I have some spare time I shall try them.
Then we had some real nuts-and-bolts stuff; I showed Hugh how to do Doxygen documentation, and Daniel showed me a bit about autoconf/automake and how to integrate them into my coding. He also suggested a technique of checking for the existence of a library at runtime (e.g. libmagic) in order to determine whether we should call the libmagic routines to check file type; unfortunately I can't now remember what this magical call was. I should have been writing this nine days ago.
It started out not looking so good, but I think it was one of the better Programming SIGs I've been to.
P.S. I've also learnt tonight that, if my WiFi is connecting and then
almost immediately disconnecting after showing now signal strength, unloading
and reloading the kernel module (after stopping the ipw3945d service) will
reset it; starting the ipw3945d service again will get things back on track.
Or so it would seem from this initial test.
posted at: 08:25 | path: /tech/clug | permanent link to this entry
Not learning the hard way
It's now mid Sunday, and I'm mentally and physically wrecked. This is
partly due to a throat infection thing that's going around, and partly due
to weird LVM stuff, and partly due to sheer bloody-minded stupidity.
It started a couple of days ago, when I took a day off because of a sore throat. I decided, finally, to upgrade my MythTV machine to Fedora Core 6, and in the process remove the old boot drive and change over to a new Logical Volume (LV) under LVM. After several attempts at this problem before, I'd decided to use a mirrored LV to store the root volume on. Luckily I had three disks - mirroring in LV requires a disk per mirror and an extra for the 'transaction log' - and I set it up, copied the old root file system to the new mirror, and that was enough for Fedora Core to recognise in order to install.
But, after a couple of mysterious crashes that ended up in file system checks throwing pages and pages of errors, I started really wondering. LVM is wonderful and stable and allows you to agglomerate disks in ways you would otherwise pay lots of money for hardware solutions to achieve, but my experience so far is that when it goes bad, it starts getting rather difficult to recover. Having the root file system stored in a way that I wasn't sure I could ever recover if one disk went bad - all FAQs and HowTos to the contrary - I decided to go back to plain old partitions.
I bought a 400GB disk for $199 at Aus PC Market with the intention of pensioning the 160GB drive off in the MythTV machine, and giving it a bit more recording headroom. But for various otiose reasons Friday knocked me out and I started feeling very congested and the sore throat had returned from Wednesday. Unable to sleep, I put the new drive in the MythTV machine, partitioned it, copied all the files from the old root file system across, and booted it - it came up fine. In a fit of what seemed at the time to be inspiration but I now know to be a madness brought on by addiction to Lemsip, I also decided to move the data off one of the 250GB drives temporarily so I could partition it.
Long ago when I was setting up the system, I had realised that LVM PVs can be created on the raw disk device as well as in partitions. This sounded like a brilliant idea - no partition to worry about, LVM could put LVs on it anyway, and one less command to perform. Interestingly, you also get about 96MB of extra space. However, this decision has come back to haunt me.
Firstly, back when I was first trying to eliminate the old 40GB disk, I wanted to have a three-way RAID. LVM doesn't do that, but MD does. But you need three partitions the same size. I couldn't repartition /dev/hdb because, well, there wasn't a partition on there to alter. So that idea eventually went out the window.
Now, I thought, I could lay the problem to rest. I had used pvmove before to move space off a SATA disk that I'd bought without knowing that (at the time, at least) the way SATA drives are accessed also causes my DVB cards to stutter (I think it's something to do with DMA, but I haven't traced this down). So, innocently, I issued pvmove /dev/hdb /dev/hda3.
Nothing happened. It wouldn't respond to Ctrl-C or Ctrl-Z (although other characters, uselessly, came up fine). Then every process that tried to access the LVM also seized up. "OK," I thought, "reboot and it'll be fine." But no: rebooting threw up a bunch of errors about a bad LVM state and kernel panicked. It's 5AM and I'm not feeling well and I have a dead MythTV machine - brilliant.
Of course, to add to my complications, I had returned to the old four-drive problem - I had to unplug one of the LVM drives (and thus render the LVM inoperable) in order to plug in the DVD drive to install something. I had the old MythTV partition still backed up in LVM (hopefully), so I reinstalled Fedora Core 6 from scratch (after a bunch of fruitless searching about how to disable the LVM checks at boot-up - it's possible, but you have to edit the nash init script and repack your initrd image and even then it didn't work perfectly; I was hoping for a nice kernel command-line option). Oh, and I have to install in Text mode because I didn't feel like lugging the monitor from downstairs, and even though the NVidia GeForce 5200 will display boot-up on all monitors and TV sets you have plugged in, it won't thereafter show any graphical modes on the TV without options in the Xorg config. Yay.
The new Fedora Core install allowed me to do a pvmove --abort, which then allowed me to see the storage VG and the old root VG. "Hell," I thought, "while I'm here I'll just rebuild the thing from scratch - I've got too much ATRPMS kruft in there anyway." That merrily ate up the hours from six until nine - copying config across, setting daemons to start, turning unwanted services off, updating the repository config with local mirrors, getting the video drivers working again, and so forth.
That night, I woke up for otiose reasons at about four in the morning. Unable to get back to sleep, I decided to look at the config again. The wool in my head and the nettles in my throat made me decide that retrying the pvmove command would be perfectly reasonable - it must have been a temporary glitch. This time, just in case, I dd'd the entire newly-created partition over to another system on my network, created a new 'old root' LV that wasn't striped, mirrored or afraid of water, copied the old 'old root' LV over to that, and removed the old one just in case it was something to do with the mirroring that had caused LVM to bork out. Now secure in my preventative measures, I issued the pvmove command again.
Same result. System locked up.
I rebooted, this time using the System Rescue CD, which allowed me to see the network and the partitions. Right, copy the dd image back again, and reboot... Nope, same problem. Worse, now the LVM partition on /dev/hda3 doesn't exist. Hmmmm. This is bad. Hmmmmm. /dev/hda3 sounds familiar - with that growing horror that computer problems specialise in, I realise that I copied the 20GB partition to /dev/hda3 (the LVM PV) rather than /dev/hda2 (the ext2 file system). Bugger. I can boot, and everything runs, but now the VG won't come up because one of its PVs is AWOL.
I tried grabbing the first couple of sectors of another PV, inserting the correct UUID (which, fortunately, the VG still knows about and includes in its complaints) in the correct spot (after a bit of guesswork - thank Bram Moolenaar for the binary editing capabilities of vim). Nup, no luck - didn't think I could fool it that easily. No-one in any of the IRC channels I was in could offer any assistance (#lvm on freenode is usually quiet as a grave anyway).
One of my worst habits is the way I avoid any problem that's stumped me a bit. Several games of Sudoku, Spider and Armagetron and a lot of idle chatting on various IRC channels later, I was still no nearer a solution. Then, realising that no-one was going to help me and I had to do it myself, I probed around in the options of pvcreate, and found I could specify a UUID. Brilliant! Suddenly the PV, VG and LV was back on the air. Five hours after I'd woken up, I collapsed back into bed. It was Sunday. (At this point, LVM hadn't put anything permanently in the /dev/hda3 PV, so it was merely a question of making sure it was included.)
That afternoon, I made sure that MythTV was going to update its programme
guide and relaxed, watching a few TV shows. It seemed an uncommon luxury.
posted at: 08:24 | path: /tech/fedora | permanent link to this entry
All posts licensed under the CC-BY-NC license. Author Paul Wayper.
You can also read this blog as a syndicated RSS feed.