Monday, May 07, 2007
Brute force disaster recovery for CentOS5
Today's trick is moving a CentOS5 system from an old set of disks over to a new set of disks. Along the way, I'll create an image of the system to allow me to restore it later on.

The CentOS5 system is a fresh install running RAID-1 across (3) disks using Linux Software RAID (mdadm). There are (4) primary partitions (boot, root, swap, LVM) with no data on the LVM partition.

(Why 3 active disks? The normal setup for this server was RAID-1 across 2 disks with a hot-spare. Rather then have a risky window of time where one disk has failed and the hot-spare is synchronizing with the remaining good disk, I prefer to have all 3 disks running. That way, when a disk dies, we still have 2 disks in action. The mdadm / Software RAID doesn't seem to care and it doesn't seem to affect performance at all.)

Because this is RAID-1, capturing the disk layout and migrating over to the new disks will be very easy. It's also a very fresh install, so I'm just going to grab the disk contents using "dd" (most of the partition's sectors are still zeroed out from the original install). Once I've backed up the (3) partitions on the first drive, I'm going to pull the (3) drives and replace them with the new ones.

I'll get the machine up and running with the first replacement drive, then configure the blank 2nd and 3rd drives and add them to the RAID set. That is, if mdadm doesn't beat me to the punch and start the sync on the 2nd/3rd disks automatically.

If things go bad, I can always drop the original disks back in the unit and power it back up. I plan on keeping them around for a few days, just in case. I'll have to recreate the LVM volumes, but there aren't any yet (just a PV and a VG).

One advantage of pulling the old drives out completely and rebuilding using fresh drives - I'll end up with a tested disaster recovery process.

Now for the nitty gritty. I'm using a USB pocket drive formatted with ext3 for the rescue work. Make sure that you plug this in before booting the rescue CD.

  1. Login to the system and power it down.
  2. Boot the CentOS5 install DVD
  3. At the "boot:" enter "linux rescue"
  4. Work your way through the startup dialogs
  5. When prompted whether to mount your linux install, choose "Skip"

This should give you a command shell with useful tools. So let's poke around and check on our system.

  1. Looking at "cat /proc/mdstat" shows that while the mdadm software is running, it has not assembled any RAID arrays.
  2. The "fdisk -l" command shows us that the (3) existing disks are named sda, sdb, sdc. Each has (4) partitions (boot, root, swap, LVM).
  3. My USB drive showed up as "/dev/sdd" so I'll create a "/backup" folder and mount it using "mkdir /backup ; mount /dev/sdd1 /backup ; df -h"

Naturally, we should create a sub-folder under /backup for each machine and possibly create another folder underneath it using today's date. We should grab information about the current disk layout and store it in a text file (fdisk.txt).

  1. # cd /backup ; mkdir machinename ; cd machinename
  2. # mkdir todaysdate ; cd todaysdate
  3. # fdisk -l > fdisk.txt

Now to grab the boot loader and image the two critical partitions (boot and root). We'll grab the boot loader off of all (3) drives because it's so small (and it may not be properly synchronized).

  1. dd if=/dev/sda bs=512 count=1 of=machinename-date-sda.mbr
  2. dd if=/dev/sdb bs=512 count=1 of=machinename-date-sdb.mbr
  3. dd if=/dev/sdb bs=512 count=1 of=machinename-date-sdc.mbr
  4. dd if=/dev/sda1 | gzip > machinename-date-ddcopy-sda1.img.gz
  5. dd if=/dev/sda2 | gzip > machinename-date-ddcopy-sda2.img.gz

Total disk space for my system was around 1.75GB worth of compressed files (8GB root, 250MB boot). You could also use bzip2 if you need more compression. Unfortunately, the CentOS5 DVD does not include the "split" command, which could cause issues if you're trying to write to a filesystem that can't handle files over 2GB in size.

Now you should shut the box back down, burn those files to DVD-R, install the new (blank) disks, and boot from the install DVD again. Again, mount the drive that holds the rescue image files to a suitable path.

  1. dd of=/dev/sda bs=512 count=1 if=machinename-date-sda.mbr
  2. fdisk /dev/sda (fix the last partition)
  3. dd if=/dev/sda bs=512 count=1 of=/dev/sdb
  4. dd if=/dev/sda bs=512 count=1 of=/dev/sdc

That will restore the MBR and partition table from the old drive to the new one. If your new drive has a different size, then the last partition will be incorrectly sized for the disk. Fire up "fdisk" and delete / recreate the last partition on the disk.

Restore the two partition images:

  1. # gzip -dc machinename-date-ddcopy-sda1.img.gz | dd of=/dev/sda1
  2. # gzip -dc machinename-date-ddcopy-sda2.img.gz | dd of=/dev/sda2

At this point, we should be able to boot the system on the primary drive and have Software RAID come up in degraded mode for the arrays. Things that will need to be done once the unit boots:

  1. Tell mdadm about the 2nd (and 3rd) disks and tell it to bring those partitions into the arrays and synchronize them.
  2. Create a new swap area
  3. Recreate the LVM physical volume (PV) and volume group (VG)
  4. Restore any data from the LVM area (we had none in this example)

Getting the Software RAID back up and happy is the trickiest of the steps.

  1. Login as root, open up a terminal window
  2. # cat /proc/mdstat
  3. # fdisk -l
  4. Notice that the swap area on our system is sda3, sdb3, sdc3 and will need to be loaded as /dev/md1.
  5. # mdadm --create /dev/md1 -v --level=raid1 --raid-devices=3 /dev/sda3 /dev/sdb3 /dev/sdc3
  6. # mkswap /dev/md1 ; swapon /dev/md1
  7. Now we're ready to recreate the LVM area
  8. # mknod /dev/md3 b 9 3
  9. # mdadm --create /dev/md3 -v --level=raid1 --raid-devices=3 /dev/sda4 /dev/sdb4 /dev/sdc4
  10. # pvcreate /dev/md3 ; vgcreate vg /dev/md3
  11. Finally, we should add the 2nd and 3rd drive to md0 and md2.
  12. mdadm --add /dev/md0 /dev/sdc1
  13. mdadm --add /dev/md0 /dev/sdb1

Note: If your triple mirror RAID array puts the additional disks in as spares, make sure that you have (a) grown the number of raid devices to 3 for the RAID1 set and (b) make sure that there are no other arrays synchronizing as the same time. It's also best to add the elements one at a time, rather then adding both at the same time. I'm not sure if it's a bug in mdadm or just the way it works, but it took me two tries to get my triple mirror back up with all disks marked as "active" instead of (2) active and (1) hot-spare.

Labels: , , ,

Tuesday, March 06, 2007
My new preferred disk layout for servers
With age comes wisdom? After working with SoftwareRAID and Linux servers for a while, I've changed my preferred disk system design and layout.

RAID

Under the old system, I was running a (2) disk RAID1 (mirror) with a hot-spare disk setup and ready for action. But if you're going to have a hot-spare dedicated to the RAID1 array, why not use it as an active array member? That way, if a disk fails, you still have two good disks. Unfortunately, when a RAID element fails, the load from the rebuild process can often kill the one of the remaining disks in the array.

Is it a likely scenario? Probably not. But Linux's Software RAID handles a triple-active RAID1 mirror without any slowdown, so there's not much reason *not* to implement it that way. Plus it's a useful trick to know for situations where you really *do* need to be that paranoid.

(I'm not sure whether any hardware RAID cards provide for a triple-active mirroring RAID1 configuration.)

Partitions

I've also simplified how many partitions I like to have on the disk. My current disk layouts typically look like:

/dev/sdX1 - /dev/md0 - 250MB - /boot
/dev/sdX2 - /dev/md1 - 12GB - / (primary root)
/dev/sdX3 - /dev/md2 - 12GB - / (backup root)
/dev/sdX5 - /dev/md3 - 32GB - /var/log
/dev/sdX6 - /dev/md4 - 2GB - swap
/dev/sdX7 - /dev/md5 - 64GB - /backup/system
/dev/sdX8 - /dev/md6 - (remainder) - LVM area

During normal operations, we boot and run /dev/md1 as our / (root) partition. The /dev/md2 partition is kept offline and is never mounted. Periodically, after validating that the server is in good health, we will copy the contents of /dev/md1 to /dev/md2, make adjustments to /etc/fstab and the hostname. This requires some server downtime (long enough to setup the 2nd root partition).

In the case where the primary OS is hosed, we can boot from the backup OS partition and get back up and running quickly. That gives us the luxury to continue operations until we can schedule downtime to fix the primary OS partition.

Notice that I've broken /var/log out to its own partition. I do this so that an overflowing set of logs won't take the server box down. Plus, by putting the log files in their own physical partition, it's easy to use a boot CD or USB key to gain access to the logs in case of severe issues.

The other physical partition that I consider necessary is /backup/system. This partition is used to hold images of the boot and root partitions, along with information about the partition layout and images of the MBRs. Basically, it's used to store disaster recovery backups. You should not have this partition mounted during normal operations. Taking the contents of this partition offsite is also a good idea. A basic text file of how the backups were created along with information for how to restore these backups is recommended.

Summary

This setup tries to walk the fine line between keeping it simple, but having enough flexibility to deal with a large set of potential failures. Anything from a two-disk failure, to the primary OS being hosed, to both OS partitions having problems all the way up to boot records or the /boot partition being killed.

Labels: , ,

Sunday, June 20, 2004
Linux Bare-Metal Backup and Recovery Planning
The whole concept behind bare-metal restore is that it's the shortest point from pulling a replacement set of server hardware out of a box to having a working system to replace the one that's down. Ideally, without having to go through re-installing the operating system (which, in the case of Gentoo, can take a few hours). It also (generally) only works when you have exactly the same hardware in the replacement box as the box that's kaput. So if you're using any exotic hardware (e.g. a fancy RAID card), you'd best buy (3). That way when the first one dies, you can put in one of the (2) spare cards and still get at your data.

Some backup software makes the process as simple as dropping a boot CD in the drive and inserting the most recent bare-metal backup (sometimes called a "gold" tape?) tape. It can also be a multi-step process, where the first step restores the operating system to a known state and the second step handles restoring the data and applications.

Since I'm not ready to tackle writing a guide, I'll settle for a few links:

Linux Complete Backup and Recovery HOWTO (by Charles Curley) - A very good starting point.

Linux Journal: Bare Metal Recovery, Revisited (by Charles Curley, Aug 2002) - A good sidebar discussion of the HOWTO document that he wrote.

About.com: Linux backup - A collection of links to backup software.

Unix SysAdm Resources: Backup & Archival Software for Unix (Stokely Consulting) - Listing of backup software for unix.

Linux Backups Mini-FAQ (Karsten M. Self) - Good article on simply using "tar".

Labels: ,