Tuesday, March 06, 2007
My new preferred disk layout for servers
With age comes wisdom? After working with SoftwareRAID and Linux servers for a while, I've changed my preferred disk system design and layout.

RAID

Under the old system, I was running a (2) disk RAID1 (mirror) with a hot-spare disk setup and ready for action. But if you're going to have a hot-spare dedicated to the RAID1 array, why not use it as an active array member? That way, if a disk fails, you still have two good disks. Unfortunately, when a RAID element fails, the load from the rebuild process can often kill the one of the remaining disks in the array.

Is it a likely scenario? Probably not. But Linux's Software RAID handles a triple-active RAID1 mirror without any slowdown, so there's not much reason *not* to implement it that way. Plus it's a useful trick to know for situations where you really *do* need to be that paranoid.

(I'm not sure whether any hardware RAID cards provide for a triple-active mirroring RAID1 configuration.)

Partitions

I've also simplified how many partitions I like to have on the disk. My current disk layouts typically look like:

/dev/sdX1 - /dev/md0 - 250MB - /boot
/dev/sdX2 - /dev/md1 - 12GB - / (primary root)
/dev/sdX3 - /dev/md2 - 12GB - / (backup root)
/dev/sdX5 - /dev/md3 - 32GB - /var/log
/dev/sdX6 - /dev/md4 - 2GB - swap
/dev/sdX7 - /dev/md5 - 64GB - /backup/system
/dev/sdX8 - /dev/md6 - (remainder) - LVM area

During normal operations, we boot and run /dev/md1 as our / (root) partition. The /dev/md2 partition is kept offline and is never mounted. Periodically, after validating that the server is in good health, we will copy the contents of /dev/md1 to /dev/md2, make adjustments to /etc/fstab and the hostname. This requires some server downtime (long enough to setup the 2nd root partition).

In the case where the primary OS is hosed, we can boot from the backup OS partition and get back up and running quickly. That gives us the luxury to continue operations until we can schedule downtime to fix the primary OS partition.

Notice that I've broken /var/log out to its own partition. I do this so that an overflowing set of logs won't take the server box down. Plus, by putting the log files in their own physical partition, it's easy to use a boot CD or USB key to gain access to the logs in case of severe issues.

The other physical partition that I consider necessary is /backup/system. This partition is used to hold images of the boot and root partitions, along with information about the partition layout and images of the MBRs. Basically, it's used to store disaster recovery backups. You should not have this partition mounted during normal operations. Taking the contents of this partition offsite is also a good idea. A basic text file of how the backups were created along with information for how to restore these backups is recommended.

Summary

This setup tries to walk the fine line between keeping it simple, but having enough flexibility to deal with a large set of potential failures. Anything from a two-disk failure, to the primary OS being hosed, to both OS partitions having problems all the way up to boot records or the /boot partition being killed.

Labels: , ,

Sunday, June 20, 2004
Linux Bare-Metal Backup and Recovery Planning
The whole concept behind bare-metal restore is that it's the shortest point from pulling a replacement set of server hardware out of a box to having a working system to replace the one that's down. Ideally, without having to go through re-installing the operating system (which, in the case of Gentoo, can take a few hours). It also (generally) only works when you have exactly the same hardware in the replacement box as the box that's kaput. So if you're using any exotic hardware (e.g. a fancy RAID card), you'd best buy (3). That way when the first one dies, you can put in one of the (2) spare cards and still get at your data.

Some backup software makes the process as simple as dropping a boot CD in the drive and inserting the most recent bare-metal backup (sometimes called a "gold" tape?) tape. It can also be a multi-step process, where the first step restores the operating system to a known state and the second step handles restoring the data and applications.

Since I'm not ready to tackle writing a guide, I'll settle for a few links:

Linux Complete Backup and Recovery HOWTO (by Charles Curley) - A very good starting point.

Linux Journal: Bare Metal Recovery, Revisited (by Charles Curley, Aug 2002) - A good sidebar discussion of the HOWTO document that he wrote.

About.com: Linux backup - A collection of links to backup software.

Unix SysAdm Resources: Backup & Archival Software for Unix (Stokely Consulting) - Listing of backup software for unix.

Linux Backups Mini-FAQ (Karsten M. Self) - Good article on simply using "tar".

Labels: ,