<graphics>
This article describes our setup of an active-active high availability cluster for virtual Xen machines, based on Debian Lenny. The layout is sketched above and shows the the basic ideas: two identical servers provide storage and network connectivity for the cluster which holds the virtual Xen machines as cluster resources. Each server has two NICs: eth1 connects the physical machines to our server network and is used for the DRBD (see below) traffic, one heartbeat channel and dom0 login. eth0 is a network bridge that connects the virtual machines to the DPHYS network. For HA cluster operation, we use one heartbeat-channel (eth1). The filesystem stack consists of four layers: on the hardware level, each server provides a RAID1 mirror. These two RAIDs are combined into a DRBD8 device, which is basically RAID1 over ethernet. This 2-level redundancy ensures protection against hardware failures of not only individual disks, but an entire server. The DRBD volume holds an LVM container which houses the logical Xen volumes. Many people prefer a cluster filesystem like OCFS2 on DRBD, but we have decided against this idea for three main reasons: first of all, the added complexity of a cluster fs is not needed in our case as we don't need concurrent access to the same files. Furthermore, cluster filesystems don't offer the same performance as single-node filesystems like ext3 do. Finally, out-of-the-box OCFS2 sports a disk-based heartbeating that conflicts with the 'real' heartbeat mechanism used by Linux-HA. The proper way of combining OCFS2 and Linux-HA would be to include OCFS2 into the Linux-HA heartbeat stack. SLES is going this way, and we've tried to port their kernel and OCFS2 patches to our Debian software stack, but in the end it just wasn't worth the trouble.
The Xen domains act as virtual hardware that lives on the cluster. In normal operation they are distributed across the nodes to take full advantage of all hardware available in the cluster. If one node fails, the affected domains are restarted on the remaining node. The only 'waste' in hardware is given by the RAM requirement: each node must be equipped with sufficient RAM to be able to run all Xen domains that could be assigned to this node. In our 2 node-setup this effectively means double RAM - not a big deal with today's RAM prices.
The remainder of this document describes the steps required to get the cluster up and running.
We start by installing a basic-lenny system on all nodes. Basic-OS was installed through rescue-system of the schlunix-installer. Configure at least two network cards, we use 192.168.132.0 on eth0 and 10.10.32.X on eth1. Before you leave the rescue-system, don't forget to install openssh-server. Also make sure all nodes are present in /etc/hosts on each node and /etc/nsswitch.conf says hosts: files dns in order to be immune to DNS lookup problems.
Once the OS is running, we can install Xen:
# apt-get install xen-linux-system-2.6.26-1-xen-amd64 xen-tools lsof firmware-bnx2 grub
Next we install DRBD. Install our backported drbd8 (8.3):
# dpkg -i ....
DRBD's config is given by /etc/drbd.conf. The relevant parts are:
<snip>
<snap>
In our case /dev/sda3 will hold the DRBD container for LVM.
Before we create the DRBD device, we install heartbeat and friends that we can fix some things DRBD will complain about:
# apt-get install heartbeat-2 stonith
If we now initialize the DRBD container:
# drbdadm create-md $RESSOURCE
/etc/init.d/drbd status should now report a Primary/Primary and UpToDate/UpToDate DRBD device. Ok, now have levels 1 and 2 of our fs stack. LVM is level 3:
# apt-get install lvm2
Before creating the physical volume, we need to prevent LVM from complaining about duplicate devices (/dev/sda3 and /dev/drbd0): in /etc/lvm/lvm.conf we have to replace
filter = [ "r|/dev/cdrom|" ]
with
filter = [ "r|/dev/cdrom|", "r|/dev/sda3|", "a|/dev/drbd0|" ]
This tells LVM to create the volume on the DRBD device instead of the harddisk directly. On one cluster node only, we do:
# pvcreate /dev/drbd0
# vgcreate data /dev/drbd0
The second node only needs to read the LVM information:
# vgscan
At this point we don't create any logical volumes - this will be done by Xen later. We can now continue by setting up the Linux-HA framework:
In /etc/ha.d/ we need to create some config files for our cluster. ha.cf defines the basic cluster configuration:
use_logd yes
bcast eth1
node xen1 xen2
crm yes
Next we need to configure heartbeat security. Create /etc/ha.d/authkeys containing
auth 1
1 sha1 mysup3rs3xypassw0rd
and chmod 600 it. We also have to set up shared-key based ssh authentication. On both cluster nodes, do
# ssh-keygen -t dsa
# cat ~/.ssh/*.pub | ssh root@remote-system 'umask 077; cat >>.ssh/authorized_keys'