High Availability Cluster with Pacemaker, Chronyd

From SdNOG wiki
Jump to navigation Jump to search

Introduction

This guide documents the setup of a 4-node high availability cluster using:

  • Pacemaker/Corosync for cluster management
  • HAProxy for load balancing
  • Chronyd for time synchronization
  • Automated DNS updates during failover scenarios

Prerequisites

Hardware Requirements

  • 4 identical Debian servers (≥ 2GB RAM, ≥ 2 CPU cores recommended)
  • Network interfaces:
    • Primary: 1Gbps (for client traffic)
    • Secondary: 100Mbps minimum (for heartbeat)

Software Requirements

  • Debian 10/11 (tested on both)
  • Root access to all nodes
  • DNS server supporting dynamic updates

Network Requirements

  • Static IP assignments for all nodes
  • Fully qualified domain names for each node
  • Unrestricted communication on ports:
    • 2224 (pcsd)
    • 5404-5405 (corosync)
    • 80/443 (HAProxy)

Initial Node Setup

1. System Preparation

On all nodes (node1-node4):

sudo apt update && sudo apt upgrade -y

sudo apt install -y \
    pacemaker \
    pcs \
    corosync \
    crmsh \
    haproxy \
    chrony \
    bind9utils \
    mailutils \
    vim

2. Hosts File Configuration

Add to /etc/hosts on every node:

192.168.1.101 node1.cluster.local node1
192.168.1.102 node2.cluster.local node2
192.168.1.103 node3.cluster.local node3
192.168.1.104 node4.cluster.local node4

3. Firewall Configuration

sudo apt install -y ufw
sudo ufw allow from 192.168.1.0/24
sudo ufw enable

Cluster Configuration

1. Initialize Cluster Services

sudo systemctl enable --now pcsd corosync pacemaker

2. Set hacluster Password

echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd

3. Authenticate Nodes (from node1)

sudo pcs cluster auth \
    node1.cluster.local \
    node2.cluster.local \
    node3.cluster.local \
    node4.cluster.local \
    -u hacluster \
    -p SecureP@ssw0rd123 \
    --force

4. Create Cluster

sudo pcs cluster setup \
    --name haproxy_cluster \
    node1.cluster.local \
    node2.cluster.local \
    node3.cluster.local \
    node4.cluster.local \
    --force

5. Start Cluster

sudo pcs cluster start --all
sudo pcs cluster enable --all

6. Verify Status

sudo pcs status

Expected output:

Cluster name: haproxy_cluster
Stack: corosync
Current DC: node1.cluster.local (version x.x.x-x) 
4 nodes configured
0 resources configured

HAProxy Setup

1. Install HAProxy on All Nodes

sudo apt install -y haproxy

2. Configuration Template (/etc/haproxy/haproxy.cfg)

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    maxconn 4000
    tune.ssl.default-dh-param 2048

defaults
    log global
    mode http
    option httplog
    option dontlognull
    timeout connect 5s
    timeout client 50s
    timeout server 50s
    option forwardfor
    option http-server-close

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/private/example.com.pem
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    default_backend http_back

backend http_back
    balance roundrobin
    cookie SERVERID insert indirect nocache
    server webserver1 192.168.1.201:80 check cookie s1
    server webserver2 192.168.1.202:80 check cookie s2

listen stats
    bind *:1936
    stats enable
    stats uri /
    stats hide-version
    stats auth admin:SecureStatsP@ss

3. Add as Cluster Resource

sudo pcs resource create haproxy systemd:haproxy \
    op monitor interval=10s timeout=20s \
    op start interval=0s timeout=30s \
    op stop interval=0s timeout=30s \
    --group haproxy_group

Time Synchronization

1. Configure Chronyd (/etc/chrony/chrony.conf)

pool 0.debian.pool.ntp.org iburst
pool 1.debian.pool.ntp.org iburst
pool 2.debian.pool.ntp.org iburst
pool 3.debian.pool.ntp.org iburst

allow 192.168.1.0/24
leapsectz right/UTC
makestep 1.0 3

2. Verify Time Sync

sudo systemctl restart chronyd
sudo chronyc tracking

Failover Automation

1. DNS Update Script (/usr/local/bin/update_dns.sh)

#!/bin/bash

CLUSTER_NAME="haproxy_cluster"
DNS_RECORD="haproxy.example.com"
DNS_SERVER="ns1.example.com"
DNS_KEY="/etc/bind/cluster.key"
TTL=300

# List your cluster node IPs here for reference (optional, not used in script logic)
NODE_IPS=("192.168.1.101" "192.168.1.102" "192.168.1.103" "192.168.1.104")

# Detect the IP of the current active node (Current DC)
ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}')
ACTIVE_NODE_IP=$(getent hosts $ACTIVE_NODE | awk '{ print $1 }')

logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE ($ACTIVE_NODE_IP)"

nsupdate -k $DNS_KEY <<EOF
server $DNS_SERVER
update delete $DNS_RECORD A
update add $DNS_RECORD $TTL A $ACTIVE_NODE_IP
send
EOF

dig +short $DNS_RECORD @$DNS_SERVER

2. Make Script Executable

sudo chmod 750 /usr/local/bin/update_dns.sh
sudo chown hacluster:haclient /usr/local/bin/update_dns.sh

3. Configure Pacemaker Resource

sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \
    user=hacluster \
    extra_options="-e /usr/local/bin/update_dns.sh" \
    op monitor interval=15s timeout=30s \
    meta target-role=Started

sudo pcs constraint colocation add dns_update with haproxy_group INFINITY
sudo pcs constraint order haproxy_group then dns_update

Testing Procedures

1. Manual Failover Test

sudo pcs resource move haproxy_group node2.cluster.local
sudo pcs status | grep -A5 "haproxy_group"
sudo pcs resource clear haproxy_group

2. Simulate Node Failure

sudo systemctl stop corosync
sudo pcs status
watch -n 1 sudo pcs status

3. Verify DNS Updates

dig +short haproxy.example.com

Maintenance Operations

1. Node Maintenance

sudo pcs node standby node3.cluster.local
sudo pcs node unstandby node3.cluster.local

2. Cluster Maintenance

sudo pcs cluster standby --all
sudo pcs cluster unstandby --all

3. Adding/Removing Nodes

sudo pcs cluster node add node5.cluster.local
sudo pcs cluster node remove node4.cluster.local

Troubleshooting

Common Issues

Corosync fails to start

sudo corosync-cfgtool -s
sudo corosync-cmapctl | grep members

Split-brain scenario

sudo pcs property set stonith-enabled=true
sudo pcs stonith create fence_ipmi fence_ipmilan ...

Resource failures

sudo pcs resource debug-start haproxy
sudo pcs resource failcount show
sudo pcs resource cleanup haproxy

Log Locations

  • Corosync: /var/log/corosync/corosync.log
  • Pacemaker: journalctl -u pacemaker
  • HAProxy: /var/log/haproxy.log

Appendix

Sample DNS Key File (/etc/bind/cluster.key)

key "cluster-key" {
    algorithm hmac-sha256;
    secret "Base64EncodedKeyHere==";
};

Useful Commands

sudo pcs config show
sudo pcs stonith show
sudo pcs resource operations haproxy

References & Further Reading


This comprehensive guide includes: step-by-step instructions, configuration examples, automated failover procedures, maintenance and troubleshooting, and a reference appendix.