High Availability Cluster with Pacemaker, Chronyd

From SdNOG wiki
Revision as of 11:30, 22 July 2025 by Manhal.Mohamed (talk | contribs) (Sample DNS Key File (/etc/bind/cluster.key))
Jump to navigation Jump to search

High Availability Cluster with Pacemaker, Chronyd, and HAProxy

A Comprehensive Guide for Debian Systems


Introduction

This guide documents the setup of a 4-node high availability cluster using:

  • Pacemaker/Corosync for cluster management
  • HAProxy for load balancing
  • Chronyd for time synchronization
  • Automated DNS updates during failover scenarios

Prerequisites

Hardware Requirements

  • 4 identical Debian servers (≥ 2GB RAM, ≥ 2 CPU cores recommended)
  • Network interfaces:
    • Primary: 1Gbps (for client traffic)
    • Secondary: 100Mbps minimum (for heartbeat)

Software Requirements

  • Debian 10/11 (tested on both)
  • Root access to all nodes
  • DNS server supporting dynamic updates

Network Requirements

  • Static IP assignments for all nodes
  • Fully qualified domain names for each node
  • Unrestricted communication on ports:
    • 2224 (pcsd)
    • 5404-5405 (corosync)
    • 80/443 (HAProxy)

Initial Node Setup

1. System Preparation

On all nodes (node1-node4):

<syntaxhighlight lang="bash">

  1. Update system

sudo apt update && sudo apt upgrade -y

  1. Install base packages

sudo apt install -y \

   pacemaker \
   pcs \
   corosync \
   crmsh \
   haproxy \
   chrony \
   bind9utils \
   mailutils \
   vim

</syntaxhighlight>

2. Hosts File Configuration

Add to /etc/hosts on every node:

192.168.1.101 node1.cluster.local node1
192.168.1.102 node2.cluster.local node2
192.168.1.103 node3.cluster.local node3
192.168.1.104 node4.cluster.local node4

3. Firewall Configuration

<syntaxhighlight lang="bash"> sudo apt install -y ufw sudo ufw allow from 192.168.1.0/24 sudo ufw enable </syntaxhighlight>

Cluster Configuration

1. Initialize Cluster Services

<syntaxhighlight lang="bash"> sudo systemctl enable --now pcsd corosync pacemaker </syntaxhighlight>

2. Set hacluster Password

<syntaxhighlight lang="bash"> echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd </syntaxhighlight>

3. Authenticate Nodes (from node1)

<syntaxhighlight lang="bash"> sudo pcs cluster auth \

   node1.cluster.local \
   node2.cluster.local \
   node3.cluster.local \
   node4.cluster.local \
   -u hacluster \
   -p SecureP@ssw0rd123 \
   --force

</syntaxhighlight>

4. Create Cluster

<syntaxhighlight lang="bash"> sudo pcs cluster setup \

   --name haproxy_cluster \
   node1.cluster.local \
   node2.cluster.local \
   node3.cluster.local \
   node4.cluster.local \
   --force

</syntaxhighlight>

5. Start Cluster

<syntaxhighlight lang="bash"> sudo pcs cluster start --all sudo pcs cluster enable --all </syntaxhighlight>

6. Verify Status

<syntaxhighlight lang="bash"> sudo pcs status </syntaxhighlight> Expected output:

Cluster name: haproxy_cluster
Stack: corosync
Current DC: node1.cluster.local (version x.x.x-x) 
4 nodes configured
0 resources configured

HAProxy Setup

1. Install HAProxy on All Nodes

<syntaxhighlight lang="bash"> sudo apt install -y haproxy </syntaxhighlight>

2. Configuration Template (/etc/haproxy/haproxy.cfg)

<syntaxhighlight lang="conf"> global

   log /dev/log local0
   log /dev/log local1 notice
   chroot /var/lib/haproxy
   stats socket /run/haproxy/admin.sock mode 660 level admin
   stats timeout 30s
   user haproxy
   group haproxy
   daemon
   maxconn 4000
   tune.ssl.default-dh-param 2048

defaults

   log global
   mode http
   option httplog
   option dontlognull
   timeout connect 5s
   timeout client 50s
   timeout server 50s
   option forwardfor
   option http-server-close

frontend http_front

   bind *:80
   bind *:443 ssl crt /etc/ssl/private/example.com.pem
   http-request set-header X-Forwarded-Port %[dst_port]
   http-request add-header X-Forwarded-Proto https if { ssl_fc }
   default_backend http_back

backend http_back

   balance roundrobin
   cookie SERVERID insert indirect nocache
   server webserver1 192.168.1.201:80 check cookie s1
   server webserver2 192.168.1.202:80 check cookie s2

listen stats

   bind *:1936
   stats enable
   stats uri /
   stats hide-version
   stats auth admin:SecureStatsP@ss

</syntaxhighlight>

3. Add as Cluster Resource

<syntaxhighlight lang="bash"> sudo pcs resource create haproxy systemd:haproxy \

   op monitor interval=10s timeout=20s \
   op start interval=0s timeout=30s \
   op stop interval=0s timeout=30s \
   --group haproxy_group

</syntaxhighlight>

Time Synchronization

1. Configure Chronyd (/etc/chrony/chrony.conf)

<syntaxhighlight lang="conf"> pool 0.debian.pool.ntp.org iburst pool 1.debian.pool.ntp.org iburst pool 2.debian.pool.ntp.org iburst pool 3.debian.pool.ntp.org iburst

allow 192.168.1.0/24 leapsectz right/UTC makestep 1.0 3 </syntaxhighlight>

2. Verify Time Sync

<syntaxhighlight lang="bash"> sudo systemctl restart chronyd sudo chronyc tracking </syntaxhighlight>

Failover Automation

1. DNS Update Script (/usr/local/bin/update_dns.sh)

<syntaxhighlight lang="bash">

  1. !/bin/bash

CLUSTER_NAME="haproxy_cluster" VIP="192.168.1.100" DNS_RECORD="haproxy.example.com" DNS_SERVER="ns1.example.com" DNS_KEY="/etc/bind/cluster.key" TTL=300

ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}' logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE"

nsupdate -k $DNS_KEY <<EOF server $DNS_SERVER update delete $DNS_RECORD A update add $DNS_RECORD $TTL A $VIP send EOF

dig +short $DNS_RECORD @$DNS_SERVER </syntaxhighlight>

2. Make Script Executable

<syntaxhighlight lang="bash"> sudo chmod 750 /usr/local/bin/update_dns.sh sudo chown hacluster:haclient /usr/local/bin/update_dns.sh </syntaxhighlight>

3. Configure Pacemaker Resource

<syntaxhighlight lang="bash"> sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \

   user=hacluster \
   extra_options="-e /usr/local/bin/update_dns.sh" \
   op monitor interval=15s timeout=30s \
   meta target-role=Started

sudo pcs constraint colocation add dns_update with haproxy_group INFINITY sudo pcs constraint order haproxy_group then dns_update </syntaxhighlight>

Testing Procedures

1. Manual Failover Test

<syntaxhighlight lang="bash"> sudo pcs resource move haproxy_group node2.cluster.local sudo pcs status | grep -A5 "haproxy_group" sudo pcs resource clear haproxy_group </syntaxhighlight>

2. Simulate Node Failure

<syntaxhighlight lang="bash"> sudo systemctl stop corosync sudo pcs status watch -n 1 sudo pcs status </syntaxhighlight>

3. Verify DNS Updates

<syntaxhighlight lang="bash"> dig +short haproxy.example.com </syntaxhighlight>

Maintenance Operations

1. Node Maintenance

<syntaxhighlight lang="bash"> sudo pcs node standby node3.cluster.local sudo pcs node unstandby node3.cluster.local </syntaxhighlight>

2. Cluster Maintenance

<syntaxhighlight lang="bash"> sudo pcs cluster standby --all sudo pcs cluster unstandby --all </syntaxhighlight>

3. Adding/Removing Nodes

<syntaxhighlight lang="bash"> sudo pcs cluster node add node5.cluster.local sudo pcs cluster node remove node4.cluster.local </syntaxhighlight>

Troubleshooting

Common Issues

Corosync fails to start <syntaxhighlight lang="bash"> sudo corosync-cfgtool -s sudo corosync-cmapctl | grep members </syntaxhighlight>

Split-brain scenario <syntaxhighlight lang="bash"> sudo pcs property set stonith-enabled=true sudo pcs stonith create fence_ipmi fence_ipmilan ... </syntaxhighlight>

Resource failures <syntaxhighlight lang="bash"> sudo pcs resource debug-start haproxy sudo pcs resource failcount show sudo pcs resource cleanup haproxy </syntaxhighlight>

Log Locations

  • Corosync: /var/log/corosync/corosync.log
  • Pacemaker: journalctl -u pacemaker
  • HAProxy: /var/log/haproxy.log

Appendix

Sample DNS Key File (/etc/bind/cluster.key)

<syntaxhighlight lang="conf"> key "cluster-key" {

   algorithm hmac-sha256;
   secret "Base64EncodedKeyHere==";

}; </syntaxhighlight>

Useful Commands

<syntaxhighlight lang="bash"> sudo pcs config show sudo pcs stonith show sudo pcs resource operations haproxy </syntaxhighlight>

References & Further Reading


This comprehensive guide includes: step-by-step instructions, configuration examples, automated failover procedures, maintenance and troubleshooting, and a reference appendix.