Notes on a new Xen Cluster.
Disclaimer!! Please note: this information is provided on an as-is basis, without warranty of any kind, to the extent permitted by applicable law. Use at your own discretion.
Remove cfengine from the picture while we configure the system:
edit /etc/default/cfengine mv /usr/sbin/cfagent{,_orig}
Install Xen-related packages:
apt-get install xen-linux-system ii xen-hypervisor-4.1-amd64 4.1.4-3+deb7u1 amd64 Xen Hypervisor on AMD64 ii xen-linux-system-3.2.0-4-amd64 3.2.51-1 amd64 Xen system with Linux 3.2 on 64-bit PCs (meta-package) ii xen-linux-system-amd64 3.2+46 amd64 Xen system with Linux for 64-bit PCs (meta-package) ii xen-system-amd64 4.1.4-3+deb7u1 amd64 Xen System on AMD64 (meta-package) ii xen-tools 4.3.1-1 all Tools to manage Xen virtual servers ii xen-utils-4.1 4.1.4-3+deb7u1 amd64 XEN administrative tools ii xen-utils-common 4.1.4-3+deb7u1 all Xen administrative tools - common files ii xenstore-utils 4.1.4-3+deb7u1 amd64 Xenstore utilities for Xen
Update grub to boot Xen.
Console stuff still not functional. IPMI sol doesn’t seem to work.
/etc/default/grub GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=2048M dom0_max_vcpus=1 dom0_vcpus_pin" GRUB_SERIAL_COMMAND="serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1" GRUB_TERMINAL="console serial" GRUB_TIMEOUT=5 #GRUB_CMDLINE_XEN="com1=9600,8n1 console=com1,vga" #GRUB_CMDLINE_LINUX="console=tty0 console=hvc0" GRUB_CMDLINE_XEN="loglvl=all guest_loglvl=all com1=115200,8n1,0x3e8,5 console=com1,vga" GRUB_CMDLINE_LINUX="console=hvc0 earlyprintk=xen"
Move up the Xen grub entry:
dpkg-divert --divert /etc/grub.d/08_linux_xen --rename /etc/grub.d/20_linux_xen
To undo: dpkg-divert —rename —remove /etc/grub.d/20_linux_xen. Update grub with update-grub.
Configure Xen.
cp /etc/xen/xend-config.sxp{,_orig} vi /etc/xen/xend-config.sxp (xend-http-server yes) (xend-unix-server yes) (xend-relocation-server yes) (xend-port 8000) (xend-relocation-port 8002) (xend-address localhost) (vif-script vif-bridge) (dom0-min-mem 2048) (enable-dom0-ballooning no) (total_available_memory 0) (dom0-cpus 0) (vncpasswd '')
Configure Xendomains
~malin/bin/crush /etc/default/xendomains XENDOMAINS_SAVE=/var/lib/xen/save XENDOMAINS_RESTORE=true XENDOMAINS_AUTO=/etc/xen/auto XENDOMAINS_STOP_MAXWAIT=300 XENDOMAINS_RESTORE=false XENDOMAINS_SAVE=""
From: https://gist.github.com/bivald/5690227
Configuring external (and internal) networking, Debian Wheezy and Xen External network Dom0
Rename network card to peth0
(http://lists.xen.org/archives/html/xen-users/2012-02/msg00535.html). Edit /etc/udev/rules.d/70-persistent-net.rules to set the name of the physical interface to peth0.
Edit /etc/network/interfaces:
auto eth0 iface eth0 inet static bridge_ports peth0 address XXX.XXX.XXX.XXX netmask 255.255.255.0 gateway XXX.XXX.XXX.XXX
Use the following when configuring your DomU’s /etc/xen/domain.cfg. Note: Vif MAC address must start with 00:16:3e
vif = [ 'bridge=eth0,mac=00:16:3e:xx:xx:xx' ] # (vif mac setup works only with the beginning of 00:16:3e!, change xx:xx:xx to valid mac-address characters, 0-9 and a-f)
DomU
Edit /etc/network/interfaces
auto eth0 iface eth0 inet static address XXX.XXX.XXX.XXX gateway XXX.XXX.XXX.XXX netmask 255.255.255.0
Interal network (DomU ↔ DomU, no external connectivity)
Dom0
Edit /etc/network/interfaces
auto xenbr1 iface xenbr1 inet static address 10.0.0.1 netmask 255.255.255.0 bridge_stp off bridge_waitport 0 bridge_fd 0
Use the following when configuring your DomU’s /etc/xen/domain.cfg. Note: Vif MAC address must start with 00:16:3e
vif = [ 'bridge=xenbr1,mac=00:16:3e:xx:xx:xx' ] # (vif mac setup works only with the beginning of 00:16:3e!)
DomU
Edit /etc/network/interfaces
auto eth0 iface eth0 inet static address 10.0.0.2 netmask 255.255.255.0
Using external and internal networking
Dom0
When you have your eth0 and xenbr1, configure your DomUs /etc/xen/domain.cfg to handle both:
vif = [ 'mac=00:16:3e:xx:xx:xx,bridge=xenbr1','mac=00:16:3e:xx:xx:xx,bridge=eth0']
DomU
Now your internal network for DomU will be eth0 and the external eth1, edit /etc/network/interfaces:
auto eth0 iface eth0 inet static address 10.0.0.X netmask 255.255.255.0 auto eth1 iface eth1 inet static address XXX.XXX.XXX.XXX gateway XXX.XXX.XXX.XXX netmask 255.255.255.0
bivald commented 6 months ago
Also, comment out any networking-script in your xend config. Wheezy don’t need them and they will be removed soon (buggy) bivald commented 6 months ago
You can also try to add it via script instead, this actually works better for me:
brctl addbr xenbr1 brctl stp xenbr1 off brctl setfd xenbr1 0 ip link set xenbr1 up ifconfig xenbr1 up 10.0.0.1 netmask 255.255.255.0
Xen DomUs Installs on curtis:
I still use the xm tool stack.
:~# ~malin/bin/crush /etc/xen-tools/xen-tools.conf lvm = xen_vg install-method = debootstrap debootstrap-cmd = /usr/sbin/debootstrap size = 100Gb memory = 2048Mb swap = 2048Mb fs = ext3 dist = `xt-guess-suite-and-mirror --suite` image = sparse kernel = /boot/vmlinuz-`uname -r` initrd = /boot/initrd.img-`uname -r` arch = amd64 mirror = `xt-guess-suite-and-mirror --mirror` mirror_wheezy = http://cdn.debian.net/debian ext3_options = noatime,nodiratime,errors=remount-ro ext2_options = noatime,nodiratime,errors=remount-ro xfs_options = defaults reiserfs_options = defaults btrfs_options = defaults serial_device = hvc0 disk_device = xvda output = /etc/xen extension = .cfg
Create with:
~># xen-create-image --hostname=testholyghost --ip=132.206.178.81 \ -broadcast=132.206.178.255 --gateway=132.206.178.1 --netmask=255.255.255.0 \ -vcpu 2 --dist wheezy
Start the DomU:
~># xm create /etc/xen/testholyghost.cfg
Installation Summary --------------------- Hostname : testdevil Distribution : wheezy IP-Address(es) : 132.206.178.80 RSA Fingerprint : 00:ec:5f:bb:d4:d3:fd:a1:51:51:29:5e:c9:48:be:26 Root Password : ******** Installation Summary --------------------- Hostname : testgod Distribution : wheezy IP-Address(es) : 132.206.178.82 RSA Fingerprint : 9d:93:53:af:6b:4f:af:36:85:4b:7d:6d:cd:c7:bd:98 Root Password : ******** Installation Summary --------------------- Hostname : testholyghost Distribution : wheezy IP-Address(es) : 132.206.178.81 RSA Fingerprint : 9a:03:93:b7:9f:21:b7:85:7f:23:50:1b:a9:d3:95:86 Root Password : ********
Harden the new DomUs:
3 apt-get install colordiff lsof strace fail2ban logcheck bind9-host dnsutils nscd rsync lsb-release 4 vi /etc/hosts.allow 5 vi /etc/hosts.deny 6 mkdir .ssh 7 chmod 700 .ssh 8 vi .ssh/authorized_keys 9 apt-get install rkhunter debsums 10 debsums_init 11 dpkg-reconfigure rkhunter 12 rkhunter --propupd 13 vi /etc/rkhunter.conf.local 14 rkhunter --propupd 15 vi /etc/fail2ban/jail.local 16 /etc/init.d/fail2ban restart 17 rkhunter --check -sk 18 apt-get install vim 19 apt-get upgrade
/etc/hosts.allow
sshd: 132.206.178.171 132.206.178.142
/etc/hosts.deny
ALL: ALL
~# cat /etc/rkhunter.conf.local PKGMGR=DPKG ALLOW_SSH_ROOT_USER=yes SCRIPTWHITELIST=/usr/bin/unhide.rb
~# cat /etc/fail2ban/jail.local [DEFAULT] # "ignoreip" can be an IP address, a CIDR mask or a DNS host ignoreip = 127.0.0.1 132.206.178.0/24 bantime = 600 findtime = 600 maxretry = 4 destemail = root@bic.mni.mcgill.ca [ssh] enabled = true port = ssh filter = sshd logpath = /var/log/auth.log action = iptables[name=SSH, port=ssh, protocol=tcp] mail-whois[name=SSH, dest=bicadmin@bic.mni.mcgill.ca] mail[dest=bicadmin@bic.mni.mcgill.ca] maxretry = 4
DRBD I/O Stack
utils-tools are not at the same level as the drbd kernel module included in the 3.8.13 kernel version (8.4.2). Grab the source and build a deb package:
$ git clone git://git.drbd.org/drbd-8.4.git $ cd drbd-8.4 $ git checkout drbd-8.4.2 $ dpkg-buildpackage -rfakeroot -b -uc
(missing packages: docbook-xml docbook-xsl dpatch xsltproc )
Install (as root now) the drbd-utils_8.4.2–0_amd64.deb packages.
OK, the kernel module and the drbd tools are at the same revision, time to start creating drbd replicated block devices!
Two physical disks, partitioned as /dev/sd[ab]1
to form a Linux md software raid mirror /dev/md2
.
Create a LVM physical volume on it and a volume group xen_vg
. This volume group will hold all the logical volumes xen_lvX, X=0,1…
needed to host the Xen guests. Sandwitched between that is a DRBD replication device in active/passive mode so that all the data above it in the stack is replicated to the secondary node. The Pacemaker stuff will be added later.
A set of nested LVM volumes, with the DRBD block device /dev/drbd0
used as a physical volume. The DRBD backing device /dev/xen_vg/xen_lv0
is a LVM logical volume. The Xen disk and swap partition will live in the volume group vg0
.
phys dev [sda1, sdb1] -> md raid1 [/dev/md2] -> VG xen_vg [/dev/xen_vg] -> LV xen_lv0 [/dev/xen_vg/xen_lv0] -> DRBD r0 [/dev/drbd0] -> VG vg0 [/dev/vg0] -> LV xen0-disk/swap [/dev/vg0/xen0-disk,xen0-swap] -> LV xen_lv1 [/dev/xen_vg/xen_lv1] -> DRBD r1 [/dev/drbd1] -> VG vg0 [/dev/vg1] -> LV xen1-disk/swap [/dev/vg0/xen1-disk,xen1-swap]
Network stack
The 4.1 Xen script stuff in /etc/xen/scripts is broken/unsupported. Build the network stack manually:
# Xen network bridging now done using native tools auto br0 iface br0 inet static address 132.206.178.100 netmask 255.255.255.0 network 132.206.178.0 broadcast 132.206.178.255 gateway 132.206.178.1 bridge_ports eth0 bridge_stp on bridge_maxwait 0 # eth1 (onboard) - drbd pt2pt replication link uto eth1 iface eth1 inet static address 10.0.0.2 netmask 255.0.0.0 broadcast 10.0.0.255 pointopoint 10.0.0.1 # eth2 (pci 32bit) - corosync ring auto eth2 iface eth2 inet static address 192.168.1.4 netmask 255.255.255.0 broadcast 192.168.1.1
DRBD Configuration and Initialization
Config for resource r0.
- The backend device for the drbd block device is a LVM logical volume
/dev/xen_vg/xen_lv0
that belongs to VGxen_vg
. - I have lowered the values for
degr-wfc-timeout
andwfc-timeout
to get rid of some warnings when drbd starts.
(:source:) ~# cat /etc/drbd.d/r0.res resource r0 {
device /dev/drbd0; disk /dev/xen_vg/xen_lv0; meta-disk internal; startup { degr-wfc-timeout 25; wfc-timeout 25; } net { cram-hmac-alg sha1; shared-secret “lucid”; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } disk { fencing resource-only; on-io-error detach; } handlers { fence-peer “/usr/lib/drbd/crm-fence-peer.sh”; after-resync-target “/usr/lib/drbd/crm-unfence-peer.sh”; outdate-peer “/usr/lib/drbd/outdate-peer.sh”; split-brain “/usr/lib/drbd/notify-split-brain.sh root”; pri-on-incon-degr “/usr/lib/drbd/notify-pri-on-incon-degr.sh root”; pri-lost-after-sb “/usr/lib/drbd/notify-pri-lost-after-sb.sh root”; local-io-error “/usr/lib/drbd/notify-io-error.sh root”; before-resync-target “/usr/lib/drbd/snapshot-resync-target-lvm.sh”; after-resync-target “/usr/lib/drbd/unsnapshot-resync-target-lvm.sh”; } syncer { rate 30M; csums-alg sha1; al-extents 907; verify-alg sha1; } on curtis { address 10.0.0.1:7788; } on walter { address 10.0.0.2:7788; }
} (:sourceend:)
Note the last 2 handlers, before-resync-target “/usr/lib/drbd/snapshot-resync-target-lvm.sh”;
and after-resync-target “/usr/lib/drbd/unsnapshot-resync-target-lvm.sh”;
. See http://www.drbd.org/users-guide/s-lvm-snapshots.html for an explanation. This is only useful during initial synchronization: if a problem arises and this stage it is possible to have a situation whereby the primary node with good data is dead and the secondary in Inconsistant.
After the resource configuration is done and exactly duplicated on both nodes it’s time to initialize the resource. The next steps have to done on both nodes. This is a straight copy-cat from http://www.drbd.org/users-guide/s-first-time-up.html.
- Create device metadata. This step must be completed only on initial device creation. It initializes DRBD’s metadata:
drbdadm create-md r1 md_offset 107374178304 al_offset 107374145536 bm_offset 107370868736 Found some data ==> This might destroy existing data! <== Do you want to proceed? [need to type 'yes' to confirm] yes Writing meta data... initializing activity log NOT initializing bitmap New drbd meta data block successfully created.
- Enable the resource. This step associates the resource with its backing device (or devices, in case of a multi-volume resource), sets replication parameters, and connects the resource to its peer:
drbdadm up r1
- Observe /proc/drbd. DRBD’s virtual status file in the /proc filesystem, /proc/drbd, should now contain information similar to the following:
walter:~# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) srcversion: D4E87CE96AA95060B684559 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:104854364
The Inconsistent/Inconsistent disk state is expected at this point.
- Initial device synchronization
Select an initial sync source. If you are dealing with newly-initialized, empty disk, this choice is entirely arbitrary. If one of your nodes already has valuable data that you need to preserve, however, it is of crucial importance that you select that node as your synchronization source. If you do initial device synchronization in the wrong direction, you will lose that data. Exercise caution.
Start the initial full synchronization. This step must be performed on only one node, only on initial resource configuration, and only on the node you selected as the synchronization source. To perform this step, issue this command:
drbdadm primary --force r1
On the primary:
curtis:~# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) srcversion: D4E87CE96AA95060B684559 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:71680 nr:0 dw:0 dr:3530420 al:0 bm:214 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:101324636 [>....................] sync'ed: 3.4% (98948/102396)Mfinish: 0:41:03 speed: 41,128 (41,524) K/sec
See the LVM snapshots handlers in action for the initial synchronization:
walter:~# lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert _exp bic_pv -wi-ao-- 29.56g _root bic_pv -wi-ao-- 23.44g _tmp bic_pv -wi-ao-- 7.81g _vartmp bic_pv -wi-ao-- 1000.00m xen_lv0 xen_vg -wi-a--- 100.00g xen_lv1 xen_vg owi-aos- 100.00g xen_lv1-before-resync xen_vg swi-a-s- 110.01g xen_lv1 0.03 xen_lv2 xen_vg -wi-a--- 100.00g
On the primary, create a physical volume on the drbd device /dev/drbd1
and then a new logical group vg1
curtis:~# pvcreate /dev/drbd1 Writing physical volume data to disk "/dev/drbd1" Physical volume "/dev/drbd1" successfully created curtis:~# vgcreate vg1 /dev/drbd1 Volume group "vg1" successfully created curtis:~# pvs PV VG Fmt Attr PSize PFree /dev/drbd1 vg1 lvm2 a-- 99.99g 99.99g /dev/md1 bic_pv lvm2 a-- 692.55g 6.80g /dev/md2 xen_vg lvm2 a-- 931.38g 631.38g curtis:~# vgs VG #PV #LV #SN Attr VSize VFree bic_pv 1 4 0 wz--n- 692.55g 6.80g vg1 1 0 0 wz--n- 99.99g 99.99g xen_vg 1 3 0 wz--n- 931.38g 631.38g