I mounted the logical volumes, used mount's -B option to bind the /dev, /proc and /sys into the same file system, and chrooted into the directory structure. All worked fine. Suspecting the boot process had an old record of the LVM layout, I rebuild the initrd with mkinitrd. Copying it into place and rebooting brought the whole thing up again, much to his and his partners' delight.
Another successful bit of Linux troubleshooting achieved.
Last updated: | path: tech / fedora | permanent link to this entry
Last night's adventure was redoing that command that I've been killing the MythTV machine with. My first suspect was the mirrored LV - something about mirrors came up in the messages when I did it. The second time, I ran pvmove with the -v (verbose) option, which showed the following bit of logging:
[root@media ~]# pvmove -v /dev/hdb /dev/hda3 Wiping cache of LVM-capable devices Finding volume group "vg_storage" Archiving volume group "vg_storage" metadata (seqno 80). Creating logical volume pvmove0 Moving 56559 extents of logical volume vg_storage/lv_storage Moving 86 extents of logical volume vg_storage/lv_swap Moving 0 extents of logical volume vg_storage/lv_backup_root Found volume group "vg_storage" Found volume group "vg_storage" Updating volume group metadata Creating volume group backup "/etc/lvm/backup/vg_storage" (seqno 81). Found volume group "vg_storage" Found volume group "vg_storage" Suspending vg_storage-lv_storage (253:0) Found volume group "vg_storage" Found volume group "vg_storage" Suspending vg_storage-lv_swap (253:1) Found volume group "vg_storage" Creating vg_storage-pvmove0 Loading vg_storage-pvmove0 table device-mapper: reload ioctl failed: Invalid argument ABORTING: Temporary mirror activation failed. Run pvmove --abort. Found volume group "vg_storage" Loading vg_storage-pvmove0 table device-mapper: reload ioctl failed: Invalid argument Loading vg_storage-lv_storage table device-mapper: reload ioctl failed: Invalid argument Found volume group "vg_storage" Loading vg_storage-pvmove0 table device-mapper: reload ioctl failed: Invalid argument Loading vg_storage-lv_swap table device-mapper: reload ioctl failed: No such device or address [root@media ~]# pvmove --abortAnd there it fails. The ioctl makes me think that moving a PV on a physical volume (rather than its own partition) must be what's causing it. Maybe the IO commands just don't like working on that part of the disk. So, test number three - try moving the data off the 160GB disk. Nothing wrong with that, the PV is all on a partition just fine.
Nup. Same problem again.
This time I'm quicker to restore the system to working order. I booted up off the System Rescue CD, but I couldn't issue the pvmove --abort command because the VG isn't complete (because I've unplugged one drive for the DVD). That's OK. I resize my root partition, create a new, small partition in the available space, copy the root filesystem off the System Rescue CD (i.e. off where it's uncompressed it - nifty trick, the partition image has a header that's a shell script that loads the cloop driver and mounts itself), copy the initrd and vmlinuz files off where the CD boots from, mangle up a new item in grub's config with those files, and, after changing the drives back so the LVM can initialise correctly, boot it.
It doesn't quite go perfectly, but it gets a long way further than I'd feared. I spend a bit of time trying to work out how this particular mangling of gentoo boots, but I don't get too far. So I just mount the uncompressed file system where it should be, run pvmove --abort and check the LVM state: everything comes up good. Reboot again and it's all working.
So now I'm making my own little equivalent to Dell's system rescue install - an unobtrusive partition that will give me all the tools to get the MythTV machine back on line without having to fiddle with cables and disable the LVM and so forth. Maybe I should sit down and make something like this - a version of the System Rescue CD, or a tool within its boot config, to install a version of the System Rescue CD on a partition, including (if possible) plumbing the correct options into grub to get it to come up as an option. It makes a lot of sense for those times when you want to get around a failure but can't put in a CD (or can't get in the box to attach one).
Maybe all the true Linux gurus just boot off their distro's rescue CD and go from there, though.
Last updated: | path: tech / fedora | permanent link to this entry
It started a couple of days ago, when I took a day off because of a sore throat. I decided, finally, to upgrade my MythTV machine to Fedora Core 6, and in the process remove the old boot drive and change over to a new Logical Volume (LV) under LVM. After several attempts at this problem before, I'd decided to use a mirrored LV to store the root volume on. Luckily I had three disks - mirroring in LV requires a disk per mirror and an extra for the 'transaction log' - and I set it up, copied the old root file system to the new mirror, and that was enough for Fedora Core to recognise in order to install.
But, after a couple of mysterious crashes that ended up in file system checks throwing pages and pages of errors, I started really wondering. LVM is wonderful and stable and allows you to agglomerate disks in ways you would otherwise pay lots of money for hardware solutions to achieve, but my experience so far is that when it goes bad, it starts getting rather difficult to recover. Having the root file system stored in a way that I wasn't sure I could ever recover if one disk went bad - all FAQs and HowTos to the contrary - I decided to go back to plain old partitions.
I bought a 400GB disk for $199 at Aus PC Market with the intention of pensioning the 160GB drive off in the MythTV machine, and giving it a bit more recording headroom. But for various otiose reasons Friday knocked me out and I started feeling very congested and the sore throat had returned from Wednesday. Unable to sleep, I put the new drive in the MythTV machine, partitioned it, copied all the files from the old root file system across, and booted it - it came up fine. In a fit of what seemed at the time to be inspiration but I now know to be a madness brought on by addiction to Lemsip, I also decided to move the data off one of the 250GB drives temporarily so I could partition it.
Long ago when I was setting up the system, I had realised that LVM PVs can be created on the raw disk device as well as in partitions. This sounded like a brilliant idea - no partition to worry about, LVM could put LVs on it anyway, and one less command to perform. Interestingly, you also get about 96MB of extra space. However, this decision has come back to haunt me.
Firstly, back when I was first trying to eliminate the old 40GB disk, I wanted to have a three-way RAID. LVM doesn't do that, but MD does. But you need three partitions the same size. I couldn't repartition /dev/hdb because, well, there wasn't a partition on there to alter. So that idea eventually went out the window.
Now, I thought, I could lay the problem to rest. I had used pvmove before to move space off a SATA disk that I'd bought without knowing that (at the time, at least) the way SATA drives are accessed also causes my DVB cards to stutter (I think it's something to do with DMA, but I haven't traced this down). So, innocently, I issued pvmove /dev/hdb /dev/hda3.
Nothing happened. It wouldn't respond to Ctrl-C or Ctrl-Z (although other characters, uselessly, came up fine). Then every process that tried to access the LVM also seized up. "OK," I thought, "reboot and it'll be fine." But no: rebooting threw up a bunch of errors about a bad LVM state and kernel panicked. It's 5AM and I'm not feeling well and I have a dead MythTV machine - brilliant.
Of course, to add to my complications, I had returned to the old four-drive problem - I had to unplug one of the LVM drives (and thus render the LVM inoperable) in order to plug in the DVD drive to install something. I had the old MythTV partition still backed up in LVM (hopefully), so I reinstalled Fedora Core 6 from scratch (after a bunch of fruitless searching about how to disable the LVM checks at boot-up - it's possible, but you have to edit the nash init script and repack your initrd image and even then it didn't work perfectly; I was hoping for a nice kernel command-line option). Oh, and I have to install in Text mode because I didn't feel like lugging the monitor from downstairs, and even though the NVidia GeForce 5200 will display boot-up on all monitors and TV sets you have plugged in, it won't thereafter show any graphical modes on the TV without options in the Xorg config. Yay.
The new Fedora Core install allowed me to do a pvmove --abort, which then allowed me to see the storage VG and the old root VG. "Hell," I thought, "while I'm here I'll just rebuild the thing from scratch - I've got too much ATRPMS kruft in there anyway." That merrily ate up the hours from six until nine - copying config across, setting daemons to start, turning unwanted services off, updating the repository config with local mirrors, getting the video drivers working again, and so forth.
That night, I woke up for otiose reasons at about four in the morning. Unable to get back to sleep, I decided to look at the config again. The wool in my head and the nettles in my throat made me decide that retrying the pvmove command would be perfectly reasonable - it must have been a temporary glitch. This time, just in case, I dd'd the entire newly-created partition over to another system on my network, created a new 'old root' LV that wasn't striped, mirrored or afraid of water, copied the old 'old root' LV over to that, and removed the old one just in case it was something to do with the mirroring that had caused LVM to bork out. Now secure in my preventative measures, I issued the pvmove command again.
Same result. System locked up.
I rebooted, this time using the System Rescue CD, which allowed me to see the network and the partitions. Right, copy the dd image back again, and reboot... Nope, same problem. Worse, now the LVM partition on /dev/hda3 doesn't exist. Hmmmm. This is bad. Hmmmmm. /dev/hda3 sounds familiar - with that growing horror that computer problems specialise in, I realise that I copied the 20GB partition to /dev/hda3 (the LVM PV) rather than /dev/hda2 (the ext2 file system). Bugger. I can boot, and everything runs, but now the VG won't come up because one of its PVs is AWOL.
I tried grabbing the first couple of sectors of another PV, inserting the correct UUID (which, fortunately, the VG still knows about and includes in its complaints) in the correct spot (after a bit of guesswork - thank Bram Moolenaar for the binary editing capabilities of vim). Nup, no luck - didn't think I could fool it that easily. No-one in any of the IRC channels I was in could offer any assistance (#lvm on freenode is usually quiet as a grave anyway).
One of my worst habits is the way I avoid any problem that's stumped me a bit. Several games of Sudoku, Spider and Armagetron and a lot of idle chatting on various IRC channels later, I was still no nearer a solution. Then, realising that no-one was going to help me and I had to do it myself, I probed around in the options of pvcreate, and found I could specify a UUID. Brilliant! Suddenly the PV, VG and LV was back on the air. Five hours after I'd woken up, I collapsed back into bed. It was Sunday. (At this point, LVM hadn't put anything permanently in the /dev/hda3 PV, so it was merely a question of making sure it was included.)
That afternoon, I made sure that MythTV was going to update its programme guide and relaxed, watching a few TV shows. It seemed an uncommon luxury.
Last updated: | path: tech / fedora | permanent link to this entry
This was brought home to me forcefully yesterday. At 9:15, in my cycling pants ready to go to work, I thought "I'll just switch over the root partition labels and bring the MythTV machine back up". At 9:18 I was looking at a kernel panic, as it failed to find the new root disk, because either LVM and MD weren't started yet or my clever tripartite disk wasn't set to come up automatically. At 9:30 I was burning a copy of the Fedora Core 5 rescue CD. At 11:45 I'd successfully put the labels back but the thing was getting to point where it loads the MBR off disk and stopping. At 1:45 I was going through grub-install options with a patient guy on the #fedora channel. At 2:00, in desperation I pulled all the drive connectors off except the one I was trying to boot off. Success! I felt like curling up in bed. I still had my bike pants on.
Of course, I'd made things more difficult for myself. I have four IDE drives in this system, so I had to unplug one in order to put the IDE CD-ROM drive in. Which meant either the LVM would be down because of a missing disk, or I couldn't access the 40GB drive that I was wanting to restore boot functionality to. What I hadn't thought of was that the CD-ROM was a master device - when I put it in place of another master device that IDE chain would be fine, but when I put it in place of an IDE slave the two masters would get grumpy and not speak to anyone, which was causing the boot lockup. And of course I was being quick-and-dirty and not swapping the CD-ROM out and the correct drive in when I rebooted just in case I needed it again. So I also made a one hour problem into a six hour problem by just not thinking.
I can only assume that there are a couple of readers of Planet Linux Australia out there chuckling away to themselves at my LVM / MD exertions. Because, in hindsight, MD on LVM makes no sense whatsoever. If one of the disks in a VG goes missing, and that disk has allocated blocks on it, the whole VG is considered dead. This then means that the entire MD is dead, and no amount of persuasion is going to bring it back. This is why an MD device needs to go on a raw disk partition - because MD itself is then doing the fault tolerance, not LVM. Lesson learnt.
Just got to keep thinking, I guess... And pull my head in.
Last updated: | path: tech / fedora | permanent link to this entry
The thing that most impresses me about this process is that the copying is taking about %5 (on average) CPU for the actual cp process, and barely any time at all for the md1_raid5 and kjournald processes. So doing RAID5 in software certainly doesn't require a huge grunty CPU (this is an Athlon 2400, yes, but it hasn't even broken into a sweat yet). This'd all be possible on a VIA EPIA motherboard... And, once it's finished, I'll have a root volume that can stand a complete drive failure before it starts worrying, and when it does I'll simply add a new drive, create a new PV, add it to the VG, create a new LV for the new drive, add the drive as a new hot spare, and remove the faulty LV; all at my leisure.
The fact that I can understand all this and think it 'relatively simple' gives me a small measure of pride. One of the few things I would thank EDS for in the time I spent there was sending me on the Veritas volume management training course. LVM is still pretty easy to get a grip on without that kind of training, but it's still made things a little easier.
(Educated readers would be asking why I specified -p ra rather than using its default, ls (or, in other words - why use the parity write policy of right asymmetric rather than left symmetric?) There's no particularly good reason. Firstly, I want asymmetric rather than symmetric to spread the parity load across disks, as is consistent with RAID5. Secondly, when I see an option like this I tend to want to choose the non-default option because, if all options are tested equally but most people use the default, then if a failure mode comes up then it's more likely to be found in the default case, and that may not affect the non-default case. It's a version of the "all your eggs in one basket" argument. I don't give it much weight.)
And all of this over an command line, through SSH, to home. I love technology (when it works...)
Last updated: | path: tech / fedora | permanent link to this entry
I'm reconfiguring my MythTV server so I can remove the 40GB disk - it's old, it's dwarfed by the other drives and I'd like to have one IDE channel back for the DVD drive. This involved a bit more fun with LVM, so I thought I'd document it...
The first task was building a non-LVM /boot partition. That was accomplished by using pvresize. I wanted 100MB, but the LVM tools display size in GB, allow you to set it in MB, but round up to an extent size. I played it safe and reduced it by three times as much as I thought I'd need. Then I used fdisk to resize the LVM partition and add a new partition. First challenge: fdisk doesn't allow you to modify a partition, you can only delete it and then create it anew. Luckily I remembered to change the type of the new (LVM) partition to 8e. Second challenge: fdisk only allows you to specify partition sizes in number of cylinders or sectors. So a couple of quick back-of-envelope calculations later, I had something which was roughly the right size. I then formatted the new partition, using -i 16384 because /boot contains relatively few files that are relatively large - this saved a relatively trivial 3MB on copying. I then used pvresize to expand the LVM area back to its maximum extent inside the partition. All done.
The second task was creating a swap partition. Because I'm a speed-mad power freak, I wanted a stripe across all three LVM physical disks. Ooops, the current LV already takes up all free extents on the first two disks, so I go to work with pvmove. After I remember to specify /dev/hdc1:(starting PE)-(ending PE), rather than /dev/hdc1:(starting PE)-(number of extents), this works rather nicely. Then it's a simple matter of swapoff -a, lvcreate -S 1G -i 3 -n lv_swap vg_storage, mkswap /dev/vg_storage/lv_swap, vi /etc/fstab to change the swap device, and swapon -a to get it working again. That's the easy part. :-)
The third task is to create a root partition. Hmmm, slight problem: There's not enough space in the volume group for a 20GB root partition. Further problem: the storage LV in the main VG is formatted as XFS, and XFS can't be shrunk (at the moment). OK, I'm going to have to think about this one. But, driven mad by power now, I consider how I'd configure it: how about a mirror of two three-disk-striped LVM arrays? At first blush it sounds reasonable. How fast would it be? I set up two 1GB LVM partitions, and, delving into mdadm, create a new /dev/md1 as a RAID1 mirror across them (the full command was mdadm -C /dev/md1 -l mirror -n 2 /dev/vg_storage/lv_test_1 /dev/vg_storage/lv_test_2). time dd if=/dev/zero of=/mnt/test/thing count=200000 (displaying my great ability to choose meaningless names) reports 2 seconds to write a 100MB file. That's pretty good, I reckon.
Of course, if any one disk goes down, that does take the whole thing with it, which is not exactly the required effect. But I'll remember that bit of the Veritas Storage Manager course sooner or later, and in the meantime I have larger fish to fry...
Last updated: | path: tech / fedora | permanent link to this entry
It all started when I upgraded the cups package and all my printers ceased to work. I found out that the package upgrade had decided that it was going to put all of the backend drivers in /usr/lib/cups/backend, but that cups itself still looks in /usr/lib64/cups/backend for them. A few symlinks later and at least the laserjet works correctly. You can read all about this in Bug 193987; basically the justification is that they're executables, not libraries, so they should be in /usr/lib. Because that makes so much more sense than /usr/bin/cups/backend, for instance.
Because of this problem, I plugged the USB printer into my testing (i386) machine and got it working. "Fine," says I, "I'll just print to it across the LAN." "Hah-hah!" said cups, "We're going to make things as difficult for you as possible!" I followed Eric Raymond's steps and managed, at least, to get cups to bind to the network interface. But it now still steadfastly refuses to either broadcast itself or let other people use it; every time I run the actual printer configuration program, it resets everything to stop other people see it too. This is just A1 weapons-grade stupid.
OK, maybe there's something screwy in my configuration. Why can't something tell me this? Why did the problem with the 64-bit backend drivers being moved manifest itself as an unknown lpr error, rather than something meaningful? Why did it appear at all? (It seems to be gone in the latest update to cups, 1.2.1-1.7, crossed fingers and touching wood.) Why doesn't the configuration un-screw itself? Why is it not Doing What I Say, let alone Doing What I Mean? And why do I seem to get no help on this at all?
Last updated: | path: tech / fedora | permanent link to this entry
I do a bit of experimenting. yum whatprovides
/var/lib64/cups/backend/usb
says that the cups package from the
base repository provides them, but it's not installed. I download it,
remove the old package with rpm -ev --nodeps cups
and
then rpm -ivh --nodeps --oldpackage cups-1.1.23-30.2.x86_64.rpm
to install the base package. Goodie, all the backend drivers are back,
but ugh, cupsd now fails with client error 127, whatever the hell that
means. yum upgrade
upgrades cups, which then gleefully
(I swear I heard the words "A working cups installation! Let's adger
it thoroughly with pitchforks!" coming from the motherboard...) removed
all the backend drivers again.
And no-one on #fedora seems able to help me. I'm up to begging for help on the blogosphere / lazyweb.
(Maybe it's an x86_64 thing - my test i386 machine seems to have both the old package and the backend drivers. But that still doesn't help me ...)
Last updated: | path: tech / fedora | permanent link to this entry
Today I realised my CGI pages weren't coming up because the scripts weren't allowed to connect to, read and write the Postgres socket. (They also seem to require the ability to getattr and read the krb5.conf file, and I have absolutely no freaking clue why, because my code doesn't use Kerberos in any way). I'd done a bit of research and found the command:
grep <error messages> /var/log/messages | audit2allow
A bit of questioning on #selinux on irc.freenode.org revealed that if I did audit2allow -M <name>, I'd get a module with the name I gave, so that later I can identify what particular policy modules I've loaded into SELinux and what they do (rather than just 'local'). (They even have version numbers too!) The module .te file is a text file, so you can edit it. Including all the various permissions you want to set in one file means that when you compile and load it with:
checkmodule -m -M -o <name>.mod <name>.te
semodule_package -o <name>.pp -m <name>.mod
semodule -i <name>.pp
(which the audit2allow script will do all but the last automatically) then you can have all your policy revisions in one neat place, rather than grabbing each separate error, making a separate policy for, and then probably overwriting the last policy module you called 'local' or whatever.
Neat.
Now all I have to learn is how to create new SELinux types, so I can say "only these HTTP scripts are allowed to read and write to this directory". Then I will truly know what the hell I'm doing. Possibly.
Last updated: | path: tech / fedora | permanent link to this entry
It seems so easy, I wonder why I haven't found it before...
Although I still want to know where the policy rules file is so I can make sure its backed up...
Last updated: | path: tech / fedora | permanent link to this entry
semanage fcontext -l | grep mysql
told me what I needed to know about the existing context rules. With a
bit of copy and paste,
semanage fcontext -a -t mysqld_db_t "/opt/mysql(/.*)?"
installed the new rule and updated the rules on the /opt/mysql tree.
Finally I found out that I had to put the [client] section
into the /etc/my.cnf file with a socket line to tell it look in
the new path for the socket, and all was well.
restorecon -v -R /opt/mysql
Ironically, the server was starting just fine; it was the 'check that the server is now running' part of the script that was failing. It took me a while to work this out... :-/
Last updated: | path: tech / fedora | permanent link to this entry
Apr 10 09:37:52 biojanus kernel: audit(1144625872.015:1953): avc: denied { getattr } for pid=2316 comm="hald" name="/" dev=dm-1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir Apr 10 09:37:52 biojanus kernel: audit(1144625872.015:1954): avc: denied { getattr } for pid=2316 comm="hald" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir Apr 10 09:37:52 biojanus kernel: audit(1144625872.167:1955): avc: denied { getattr } for pid=2316 comm="hald" name="/" dev=dm-1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir Apr 10 09:37:52 biojanus kernel: audit(1144625872.167:1956): avc: denied { getattr } for pid=2316 comm="hald" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir Apr 10 09:37:52 biojanus kernel: audit(1144625872.335:1957): avc: denied { getattr } for pid=12272 comm="hal-system-stor" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir Apr 10 09:37:52 biojanus kernel: audit(1144625872.335:1958): avc: denied { getattr } for pid=12272 comm="hal-system-stor" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir Apr 10 09:37:52 biojanus kernel: audit(1144625872.339:1959): avc: denied { search } for pid=12280 comm="touch" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir... and so on. Now to figure out how to fix it...
Last updated: | path: tech / fedora | permanent link to this entry
Unfortunately, running the program gave me the error "library name: cannot restore segment prot after reloc: Permission denied. ". A Google on this error message showed me that it's caused by SELinux, which doesn't just allow anyone to come along and install new shared object libraries - you have to make sure that they're set to be a shared library (type 'texrel_shlib_t'). So applying the command 'chcon -v -t texrel_shlib_t /path/to/library' made everything suddenly work. And made me learn a little more about how SELinux sits in with all the other parts of Linux.
Learning: it's a good thing.
Last updated: | path: tech / fedora | permanent link to this entry
But here's how not to do an install of Fedora Core 5:
On the plus side, I learnt a few things:
Last updated: | path: tech / fedora | permanent link to this entry
And the thing I love about Linux is that you have total control. You want to turn off swapping? swapoff -a. There it is, gone, and the system is still mabulating away happily. Delete the old swap partition? lvremove /dev/mainvg/swaplv. Create the new swap partition? lvcreate -i 2 -I 64 -L 4G -n swaplv mainvg to create a 4GB partition striped across the two disks. After a bit of mental prodding I remembered I have to do mkswap /dev/mainvg/swaplv to make the partition look like a swap device, and then swapon -a Just Works. There you go, 4GB of new improved faster swap space. No reboots, no special utilities, no "what's this C:\win386.swp file?". Easy.
Last updated: | path: tech / fedora | permanent link to this entry
Which the ANU has three 55Mbit Frame Relay links to.
Which means that I downloaded the five images in about five minutes - at the average rate of 10MB/sec (the speed of my link to the building switch, as it turns out. I've got a gigabit switch here but my upstream is still limited...).
Then I copied it onto my 80GB portable laptop drive, and whisked it home, for the purposes of sharing for those less fortunate.
Internode still only has the DVD image. I'll get that too. That can wait until I get home.
Last updated: | path: tech / fedora | permanent link to this entry
All posts licensed under the CC-BY-NC license. Author Paul Wayper.
Main index
/ tbfw/
- © 2004-2023
Paul Wayper
Valid HTML5