Notes on a new Xen Cluster.

Disclaimer!!
Please note: this information 
is provided on an as-is basis, 
without warranty of any kind, 
to the extent permitted by applicable law. 
Use at your own discretion.

Remove cfengine from the picture while we configure the system:

edit /etc/default/cfengine mv /usr/sbin/cfagent{,_orig}

Install Xen-related packages:

apt-get install xen-linux-system

ii  xen-hypervisor-4.1-amd64            4.1.4-3+deb7u1 amd64        Xen Hypervisor on AMD64
ii  xen-linux-system-3.2.0-4-amd64      3.2.51-1       amd64        Xen system with Linux 3.2 on 64-bit PCs (meta-package)
ii  xen-linux-system-amd64              3.2+46         amd64        Xen system with Linux for 64-bit PCs (meta-package)
ii  xen-system-amd64                    4.1.4-3+deb7u1 amd64        Xen System on AMD64 (meta-package)
ii  xen-tools                           4.3.1-1        all          Tools to manage Xen virtual servers
ii  xen-utils-4.1                       4.1.4-3+deb7u1 amd64        XEN administrative tools
ii  xen-utils-common                    4.1.4-3+deb7u1 all          Xen administrative tools - common files
ii  xenstore-utils                      4.1.4-3+deb7u1 amd64        Xenstore utilities for Xen

Update grub to boot Xen.

Console stuff still not functional. IPMI sol doesn’t seem to work.

/etc/default/grub

GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=2048M dom0_max_vcpus=1 dom0_vcpus_pin"
GRUB_SERIAL_COMMAND="serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1"
GRUB_TERMINAL="console serial"
GRUB_TIMEOUT=5
#GRUB_CMDLINE_XEN="com1=9600,8n1 console=com1,vga"
#GRUB_CMDLINE_LINUX="console=tty0 console=hvc0"
GRUB_CMDLINE_XEN="loglvl=all guest_loglvl=all com1=115200,8n1,0x3e8,5 console=com1,vga"
GRUB_CMDLINE_LINUX="console=hvc0 earlyprintk=xen"

Move up the Xen grub entry:

dpkg-divert --divert /etc/grub.d/08_linux_xen --rename /etc/grub.d/20_linux_xen

To undo: dpkg-divert —rename —remove /etc/grub.d/20_linux_xen. Update grub with update-grub.

Configure Xen.

cp /etc/xen/xend-config.sxp{,_orig}
vi /etc/xen/xend-config.sxp

(xend-http-server yes)
(xend-unix-server yes)
(xend-relocation-server yes)
(xend-port            8000)
(xend-relocation-port 8002)
(xend-address localhost)
(vif-script vif-bridge)
(dom0-min-mem 2048)
(enable-dom0-ballooning no)
(total_available_memory 0)
(dom0-cpus 0)
(vncpasswd '')

Configure Xendomains

~malin/bin/crush /etc/default/xendomains
XENDOMAINS_SAVE=/var/lib/xen/save
XENDOMAINS_RESTORE=true
XENDOMAINS_AUTO=/etc/xen/auto
XENDOMAINS_STOP_MAXWAIT=300
XENDOMAINS_RESTORE=false
XENDOMAINS_SAVE=""

From: https://gist.github.com/bivald/5690227

Configuring external (and internal) networking, Debian Wheezy and Xen External network Dom0

   Rename network card to peth0

(http://lists.xen.org/archives/html/xen-users/2012-02/msg00535.html). Edit /etc/udev/rules.d/70-persistent-net.rules to set the name of the physical interface to peth0.

Edit /etc/network/interfaces:

auto eth0
iface eth0 inet static
  bridge_ports peth0
  address XXX.XXX.XXX.XXX
  netmask 255.255.255.0
  gateway XXX.XXX.XXX.XXX

Use the following when configuring your DomU’s /etc/xen/domain.cfg. Note: Vif MAC address must start with 00:16:3e

vif = [ 'bridge=eth0,mac=00:16:3e:xx:xx:xx' ]  # (vif mac setup works only with the beginning of 00:16:3e!, change xx:xx:xx to valid mac-address characters,
0-9 and a-f)

DomU

Edit /etc/network/interfaces

auto eth0
iface eth0 inet static
 address XXX.XXX.XXX.XXX
 gateway XXX.XXX.XXX.XXX
 netmask 255.255.255.0

Interal network (DomU ↔ DomU, no external connectivity)

Dom0

Edit /etc/network/interfaces

auto xenbr1
iface xenbr1 inet static
  address 10.0.0.1
    netmask 255.255.255.0
    bridge_stp off
    bridge_waitport 0
    bridge_fd 0

Use the following when configuring your DomU’s /etc/xen/domain.cfg. Note: Vif MAC address must start with 00:16:3e

vif = [ 'bridge=xenbr1,mac=00:16:3e:xx:xx:xx' ]  # (vif mac setup works only
with the beginning of 00:16:3e!)

DomU

Edit /etc/network/interfaces

auto eth0
iface eth0 inet static
 address 10.0.0.2
 netmask 255.255.255.0

Using external and internal networking

Dom0

When you have your eth0 and xenbr1, configure your DomUs /etc/xen/domain.cfg to handle both:

vif = [ 'mac=00:16:3e:xx:xx:xx,bridge=xenbr1','mac=00:16:3e:xx:xx:xx,bridge=eth0']

DomU

Now your internal network for DomU will be eth0 and the external eth1, edit /etc/network/interfaces:

auto eth0
iface eth0 inet static
 address 10.0.0.X
 netmask 255.255.255.0

auto eth1
iface eth1 inet static
 address XXX.XXX.XXX.XXX
 gateway XXX.XXX.XXX.XXX
 netmask 255.255.255.0

bivald commented 6 months ago

Also, comment out any networking-script in your xend config. Wheezy don’t need them and they will be removed soon (buggy) bivald commented 6 months ago

You can also try to add it via script instead, this actually works better for me:

brctl addbr xenbr1
brctl stp xenbr1 off
brctl setfd xenbr1 0
ip link set xenbr1 up

ifconfig xenbr1 up 10.0.0.1 netmask 255.255.255.0

Xen DomUs Installs on curtis:

I still use the xm tool stack.

:~# ~malin/bin/crush /etc/xen-tools/xen-tools.conf
lvm = xen_vg
install-method = debootstrap
debootstrap-cmd = /usr/sbin/debootstrap
size   = 100Gb
memory = 2048Mb
swap   = 2048Mb
fs     = ext3
dist   = `xt-guess-suite-and-mirror --suite`
image  = sparse
kernel = /boot/vmlinuz-`uname -r`
initrd = /boot/initrd.img-`uname -r`
arch = amd64
mirror = `xt-guess-suite-and-mirror --mirror`
mirror_wheezy = http://cdn.debian.net/debian
ext3_options     = noatime,nodiratime,errors=remount-ro
ext2_options     = noatime,nodiratime,errors=remount-ro
xfs_options      = defaults
reiserfs_options = defaults
btrfs_options    = defaults
serial_device = hvc0
disk_device = xvda
output    = /etc/xen
extension = .cfg

Create with:

~># xen-create-image --hostname=testholyghost --ip=132.206.178.81 \
-broadcast=132.206.178.255 --gateway=132.206.178.1 --netmask=255.255.255.0 \
-vcpu 2 --dist wheezy

Start the DomU:

~># xm create /etc/xen/testholyghost.cfg
Installation Summary
---------------------
Hostname        :  testdevil
Distribution    :  wheezy
IP-Address(es)  :  132.206.178.80 
RSA Fingerprint :  00:ec:5f:bb:d4:d3:fd:a1:51:51:29:5e:c9:48:be:26
Root Password   :  ********

Installation Summary
---------------------
Hostname        :  testgod
Distribution    :  wheezy
IP-Address(es)  :  132.206.178.82 
RSA Fingerprint :  9d:93:53:af:6b:4f:af:36:85:4b:7d:6d:cd:c7:bd:98
Root Password   :  ********

Installation Summary
---------------------
Hostname        :  testholyghost
Distribution    :  wheezy
IP-Address(es)  :  132.206.178.81 
RSA Fingerprint :  9a:03:93:b7:9f:21:b7:85:7f:23:50:1b:a9:d3:95:86
Root Password   :  ********

Harden the new DomUs:

    3  apt-get install colordiff lsof strace fail2ban logcheck bind9-host dnsutils nscd rsync lsb-release
    4  vi /etc/hosts.allow 
    5  vi /etc/hosts.deny 
    6  mkdir .ssh
    7  chmod 700 .ssh
    8  vi .ssh/authorized_keys
    9  apt-get install rkhunter debsums
   10  debsums_init 
   11  dpkg-reconfigure rkhunter
   12  rkhunter --propupd
   13  vi /etc/rkhunter.conf.local
   14  rkhunter --propupd
   15  vi /etc/fail2ban/jail.local
   16  /etc/init.d/fail2ban restart
   17  rkhunter --check -sk
   18  apt-get install vim
   19  apt-get upgrade

/etc/hosts.allow

sshd: 132.206.178.171 132.206.178.142

/etc/hosts.deny

ALL: ALL
~# cat /etc/rkhunter.conf.local
PKGMGR=DPKG
ALLOW_SSH_ROOT_USER=yes
SCRIPTWHITELIST=/usr/bin/unhide.rb
~# cat /etc/fail2ban/jail.local 
[DEFAULT]

# "ignoreip" can be an IP address, a CIDR mask or a DNS host
ignoreip = 127.0.0.1 132.206.178.0/24 
bantime  = 600
findtime = 600
maxretry = 4

destemail = root@bic.mni.mcgill.ca

[ssh]

enabled = true
port    = ssh
filter  = sshd
logpath  = /var/log/auth.log
action   = iptables[name=SSH, port=ssh, protocol=tcp]
          mail-whois[name=SSH, dest=bicadmin@bic.mni.mcgill.ca]
          mail[dest=bicadmin@bic.mni.mcgill.ca]
maxretry = 4

DRBD I/O Stack

utils-tools are not at the same level as the drbd kernel module included in the 3.8.13 kernel version (8.4.2). Grab the source and build a deb package:

$ git clone git://git.drbd.org/drbd-8.4.git
$ cd drbd-8.4
$ git checkout drbd-8.4.2
$ dpkg-buildpackage -rfakeroot -b -uc

(missing packages: docbook-xml docbook-xsl dpatch xsltproc )

Install (as root now) the drbd-utils_8.4.2–0_amd64.deb packages.

OK, the kernel module and the drbd tools are at the same revision, time to start creating drbd replicated block devices!

Two physical disks, partitioned as /dev/sd[ab]1 to form a Linux md software raid mirror /dev/md2. Create a LVM physical volume on it and a volume group xen_vg. This volume group will hold all the logical volumes xen_lvX, X=0,1… needed to host the Xen guests. Sandwitched between that is a DRBD replication device in active/passive mode so that all the data above it in the stack is replicated to the secondary node. The Pacemaker stuff will be added later.

A set of nested LVM volumes, with the DRBD block device /dev/drbd0 used as a physical volume. The DRBD backing device /dev/xen_vg/xen_lv0 is a LVM logical volume. The Xen disk and swap partition will live in the volume group vg0.

phys dev [sda1, sdb1] -> md raid1 [/dev/md2] -> VG xen_vg [/dev/xen_vg] -> LV xen_lv0 [/dev/xen_vg/xen_lv0] -> DRBD r0 [/dev/drbd0] -> VG vg0 [/dev/vg0] -> LV xen0-disk/swap [/dev/vg0/xen0-disk,xen0-swap]
                                                                        -> LV xen_lv1 [/dev/xen_vg/xen_lv1] -> DRBD r1 [/dev/drbd1] -> VG vg0 [/dev/vg1] -> LV xen1-disk/swap [/dev/vg0/xen1-disk,xen1-swap]

Network stack

The 4.1 Xen script stuff in /etc/xen/scripts is broken/unsupported. Build the network stack manually:

# Xen network bridging now done using native tools
auto br0
iface br0 inet static
         address 132.206.178.100
         netmask 255.255.255.0
         network 132.206.178.0
         broadcast 132.206.178.255
         gateway 132.206.178.1
         bridge_ports eth0
         bridge_stp on
         bridge_maxwait 0

# eth1 (onboard) - drbd pt2pt replication link
uto eth1
iface eth1 inet static
    address 10.0.0.2
    netmask 255.0.0.0
    broadcast 10.0.0.255
    pointopoint 10.0.0.1

# eth2 (pci 32bit) - corosync ring
auto eth2
iface eth2 inet static
   address 192.168.1.4
   netmask 255.255.255.0
   broadcast 192.168.1.1

DRBD Configuration and Initialization

Config for resource r0.

  • The backend device for the drbd block device is a LVM logical volume /dev/xen_vg/xen_lv0 that belongs to VG xen_vg.
  • I have lowered the values for degr-wfc-timeout and wfc-timeout to get rid of some warnings when drbd starts.

(:source:) ~# cat /etc/drbd.d/r0.res resource r0 {

  device /dev/drbd0;
  disk /dev/xen_vg/xen_lv0;
  meta-disk internal;
  startup {
      degr-wfc-timeout 25;
      wfc-timeout 25;
  }
  net {
      cram-hmac-alg sha1;
      shared-secret “lucid”;
      after-sb-0pri discard-zero-changes;
      after-sb-1pri discard-secondary;
      after-sb-2pri disconnect;
      rr-conflict disconnect;
  }
  disk {
      fencing resource-only;
      on-io-error detach;
  }
  handlers {
      fence-peer “/usr/lib/drbd/crm-fence-peer.sh”;
      after-resync-target “/usr/lib/drbd/crm-unfence-peer.sh”;
      outdate-peer “/usr/lib/drbd/outdate-peer.sh”;
      split-brain “/usr/lib/drbd/notify-split-brain.sh root”;
      pri-on-incon-degr “/usr/lib/drbd/notify-pri-on-incon-degr.sh root”;
      pri-lost-after-sb “/usr/lib/drbd/notify-pri-lost-after-sb.sh root”;
      local-io-error “/usr/lib/drbd/notify-io-error.sh root”;
      before-resync-target “/usr/lib/drbd/snapshot-resync-target-lvm.sh”;
      after-resync-target “/usr/lib/drbd/unsnapshot-resync-target-lvm.sh”;
  }
  syncer {
      rate 30M;
      csums-alg sha1;
      al-extents 907;
      verify-alg sha1;
  }
  on curtis {
      address 10.0.0.1:7788;
  }
  on walter {
      address 10.0.0.2:7788;
  }

} (:sourceend:)

Note the last 2 handlers, before-resync-target “/usr/lib/drbd/snapshot-resync-target-lvm.sh”; and after-resync-target “/usr/lib/drbd/unsnapshot-resync-target-lvm.sh”;. See http://www.drbd.org/users-guide/s-lvm-snapshots.html for an explanation. This is only useful during initial synchronization: if a problem arises and this stage it is possible to have a situation whereby the primary node with good data is dead and the secondary in Inconsistant.

After the resource configuration is done and exactly duplicated on both nodes it’s time to initialize the resource. The next steps have to done on both nodes. This is a straight copy-cat from http://www.drbd.org/users-guide/s-first-time-up.html.

  • Create device metadata. This step must be completed only on initial device creation. It initializes DRBD’s metadata:
drbdadm create-md r1

md_offset 107374178304
al_offset 107374145536
bm_offset 107370868736

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.
  • Enable the resource. This step associates the resource with its backing device (or devices, in case of a multi-volume resource), sets replication parameters, and connects the resource to its peer:
drbdadm up r1
  • Observe /proc/drbd. DRBD’s virtual status file in the /proc filesystem, /proc/drbd, should now contain information similar to the following:
walter:~# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
srcversion: D4E87CE96AA95060B684559 

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:104854364

The Inconsistent/Inconsistent disk state is expected at this point.

  • Initial device synchronization

Select an initial sync source. If you are dealing with newly-initialized, empty disk, this choice is entirely arbitrary. If one of your nodes already has valuable data that you need to preserve, however, it is of crucial importance that you select that node as your synchronization source. If you do initial device synchronization in the wrong direction, you will lose that data. Exercise caution.

Start the initial full synchronization. This step must be performed on only one node, only on initial resource configuration, and only on the node you selected as the synchronization source. To perform this step, issue this command:

drbdadm primary --force r1

On the primary:

curtis:~# cat /proc/drbd 
version: 8.4.2 (api:1/proto:86-101)
srcversion: D4E87CE96AA95060B684559 

 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:71680 nr:0 dw:0 dr:3530420 al:0 bm:214 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:101324636
        [>....................] sync'ed:  3.4% (98948/102396)Mfinish: 0:41:03 speed: 41,128 (41,524) K/sec

See the LVM snapshots handlers in action for the initial synchronization:

walter:~# lvs
  LV                    VG     Attr     LSize    Pool Origin  Data%  Move Log Copy%  Convert
  _exp                  bic_pv -wi-ao--   29.56g                                            
  _root                 bic_pv -wi-ao--   23.44g                                            
  _tmp                  bic_pv -wi-ao--    7.81g                                            
  _vartmp               bic_pv -wi-ao-- 1000.00m                                            
  xen_lv0               xen_vg -wi-a---  100.00g                                            
  xen_lv1               xen_vg owi-aos-  100.00g                                            
  xen_lv1-before-resync xen_vg swi-a-s-  110.01g      xen_lv1   0.03                        
  xen_lv2               xen_vg -wi-a---  100.00g                                            

On the primary, create a physical volume on the drbd device /dev/drbd1 and then a new logical group vg1

curtis:~# pvcreate /dev/drbd1
  Writing physical volume data to disk "/dev/drbd1"
  Physical volume "/dev/drbd1" successfully created
curtis:~# vgcreate vg1 /dev/drbd1    
  Volume group "vg1" successfully created
curtis:~# pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/drbd1 vg1    lvm2 a--   99.99g  99.99g
  /dev/md1   bic_pv lvm2 a--  692.55g   6.80g
  /dev/md2   xen_vg lvm2 a--  931.38g 631.38g
curtis:~# vgs
  VG     #PV #LV #SN Attr   VSize   VFree  
  bic_pv   1   4   0 wz--n- 692.55g   6.80g
  vg1      1   0   0 wz--n-  99.99g  99.99g
  xen_vg   1   3   0 wz--n- 931.38g 631.38g