Difference between revisions of "High Availability Cluster with Pacemaker, Chronyd"

From SdNOG wiki
Jump to navigation Jump to search
(Sample DNS Key File (/etc/bind/cluster.key))
(1. DNS Update Script (/usr/local/bin/update_dns.sh))
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
= High Availability Cluster with Pacemaker, Chronyd, and HAProxy =
 
'''A Comprehensive Guide for Debian Systems'''
 
 
 
 
== Introduction ==
 
== Introduction ==
 
This guide documents the setup of a 4-node high availability cluster using:
 
This guide documents the setup of a 4-node high availability cluster using:
Line 34: Line 30:
 
On all nodes (node1-node4):
 
On all nodes (node1-node4):
  
<syntaxhighlight lang="bash">
+
<pre>
# Update system
 
 
sudo apt update && sudo apt upgrade -y
 
sudo apt update && sudo apt upgrade -y
  
# Install base packages
 
 
sudo apt install -y \
 
sudo apt install -y \
 
     pacemaker \
 
     pacemaker \
Line 49: Line 43:
 
     mailutils \
 
     mailutils \
 
     vim
 
     vim
</syntaxhighlight>
+
</pre>
  
 
=== 2. Hosts File Configuration ===
 
=== 2. Hosts File Configuration ===
Add to '''/etc/hosts''' on every node:
+
Add to <code>/etc/hosts</code> on every node:
 
<pre>
 
<pre>
 
192.168.1.101 node1.cluster.local node1
 
192.168.1.101 node1.cluster.local node1
Line 61: Line 55:
  
 
=== 3. Firewall Configuration ===
 
=== 3. Firewall Configuration ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo apt install -y ufw
 
sudo apt install -y ufw
 
sudo ufw allow from 192.168.1.0/24
 
sudo ufw allow from 192.168.1.0/24
 
sudo ufw enable
 
sudo ufw enable
</syntaxhighlight>
+
</pre>
  
 
== Cluster Configuration ==
 
== Cluster Configuration ==
 
=== 1. Initialize Cluster Services ===
 
=== 1. Initialize Cluster Services ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo systemctl enable --now pcsd corosync pacemaker
 
sudo systemctl enable --now pcsd corosync pacemaker
</syntaxhighlight>
+
</pre>
 
 
 
=== 2. Set hacluster Password ===
 
=== 2. Set hacluster Password ===
<syntaxhighlight lang="bash">
+
<pre>
 
echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd
 
echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd
</syntaxhighlight>
+
</pre>
 
 
 
=== 3. Authenticate Nodes (from node1) ===
 
=== 3. Authenticate Nodes (from node1) ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs cluster auth \
 
sudo pcs cluster auth \
 
     node1.cluster.local \
 
     node1.cluster.local \
Line 88: Line 80:
 
     -p SecureP@ssw0rd123 \
 
     -p SecureP@ssw0rd123 \
 
     --force
 
     --force
</syntaxhighlight>
+
</pre>
 
 
 
=== 4. Create Cluster ===
 
=== 4. Create Cluster ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs cluster setup \
 
sudo pcs cluster setup \
 
     --name haproxy_cluster \
 
     --name haproxy_cluster \
Line 99: Line 90:
 
     node4.cluster.local \
 
     node4.cluster.local \
 
     --force
 
     --force
</syntaxhighlight>
+
</pre>
 
 
 
=== 5. Start Cluster ===
 
=== 5. Start Cluster ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs cluster start --all
 
sudo pcs cluster start --all
 
sudo pcs cluster enable --all
 
sudo pcs cluster enable --all
</syntaxhighlight>
+
</pre>
 
 
 
=== 6. Verify Status ===
 
=== 6. Verify Status ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs status
 
sudo pcs status
</syntaxhighlight>
+
</pre>
 
Expected output:
 
Expected output:
 
<pre>
 
<pre>
Line 122: Line 111:
 
== HAProxy Setup ==
 
== HAProxy Setup ==
 
=== 1. Install HAProxy on All Nodes ===
 
=== 1. Install HAProxy on All Nodes ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo apt install -y haproxy
 
sudo apt install -y haproxy
</syntaxhighlight>
+
</pre>
 
 
 
=== 2. Configuration Template (/etc/haproxy/haproxy.cfg) ===
 
=== 2. Configuration Template (/etc/haproxy/haproxy.cfg) ===
<syntaxhighlight lang="conf">
+
<pre>
 
global
 
global
 
     log /dev/log local0
 
     log /dev/log local0
Line 170: Line 158:
 
     stats hide-version
 
     stats hide-version
 
     stats auth admin:SecureStatsP@ss
 
     stats auth admin:SecureStatsP@ss
</syntaxhighlight>
+
</pre>
  
 
=== 3. Add as Cluster Resource ===
 
=== 3. Add as Cluster Resource ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs resource create haproxy systemd:haproxy \
 
sudo pcs resource create haproxy systemd:haproxy \
 
     op monitor interval=10s timeout=20s \
 
     op monitor interval=10s timeout=20s \
Line 179: Line 167:
 
     op stop interval=0s timeout=30s \
 
     op stop interval=0s timeout=30s \
 
     --group haproxy_group
 
     --group haproxy_group
</syntaxhighlight>
+
</pre>
  
 
== Time Synchronization ==
 
== Time Synchronization ==
 
=== 1. Configure Chronyd (/etc/chrony/chrony.conf) ===
 
=== 1. Configure Chronyd (/etc/chrony/chrony.conf) ===
<syntaxhighlight lang="conf">
+
<pre>
 
pool 0.debian.pool.ntp.org iburst
 
pool 0.debian.pool.ntp.org iburst
 
pool 1.debian.pool.ntp.org iburst
 
pool 1.debian.pool.ntp.org iburst
Line 192: Line 180:
 
leapsectz right/UTC
 
leapsectz right/UTC
 
makestep 1.0 3
 
makestep 1.0 3
</syntaxhighlight>
+
</pre>
 
 
 
=== 2. Verify Time Sync ===
 
=== 2. Verify Time Sync ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo systemctl restart chronyd
 
sudo systemctl restart chronyd
 
sudo chronyc tracking
 
sudo chronyc tracking
</syntaxhighlight>
+
</pre>
  
 
== Failover Automation ==
 
== Failover Automation ==
 
=== 1. DNS Update Script (/usr/local/bin/update_dns.sh) ===
 
=== 1. DNS Update Script (/usr/local/bin/update_dns.sh) ===
<syntaxhighlight lang="bash">
+
<pre>
 
#!/bin/bash
 
#!/bin/bash
  
 
CLUSTER_NAME="haproxy_cluster"
 
CLUSTER_NAME="haproxy_cluster"
VIP="192.168.1.100"
 
 
DNS_RECORD="haproxy.example.com"
 
DNS_RECORD="haproxy.example.com"
 
DNS_SERVER="ns1.example.com"
 
DNS_SERVER="ns1.example.com"
Line 212: Line 198:
 
TTL=300
 
TTL=300
  
ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}'  
+
# List your cluster node IPs here for reference (optional, not used in script logic)
logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE"
+
NODE_IPS=("192.168.1.101" "192.168.1.102" "192.168.1.103" "192.168.1.104")
 +
 
 +
# Detect the IP of the current active node (Current DC)
 +
ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}')
 +
ACTIVE_NODE_IP=$(getent hosts $ACTIVE_NODE | awk '{ print $1 }')
 +
 
 +
logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE ($ACTIVE_NODE_IP)"
  
 
nsupdate -k $DNS_KEY <<EOF
 
nsupdate -k $DNS_KEY <<EOF
 
server $DNS_SERVER
 
server $DNS_SERVER
 
update delete $DNS_RECORD A
 
update delete $DNS_RECORD A
update add $DNS_RECORD $TTL A $VIP
+
update add $DNS_RECORD $TTL A $ACTIVE_NODE_IP
 
send
 
send
 
EOF
 
EOF
  
 
dig +short $DNS_RECORD @$DNS_SERVER
 
dig +short $DNS_RECORD @$DNS_SERVER
</syntaxhighlight>
+
</pre>
  
 
=== 2. Make Script Executable ===
 
=== 2. Make Script Executable ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo chmod 750 /usr/local/bin/update_dns.sh
 
sudo chmod 750 /usr/local/bin/update_dns.sh
 
sudo chown hacluster:haclient /usr/local/bin/update_dns.sh
 
sudo chown hacluster:haclient /usr/local/bin/update_dns.sh
</syntaxhighlight>
+
</pre>
 
 
 
=== 3. Configure Pacemaker Resource ===
 
=== 3. Configure Pacemaker Resource ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \
 
sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \
 
     user=hacluster \
 
     user=hacluster \
Line 241: Line 232:
 
sudo pcs constraint colocation add dns_update with haproxy_group INFINITY
 
sudo pcs constraint colocation add dns_update with haproxy_group INFINITY
 
sudo pcs constraint order haproxy_group then dns_update
 
sudo pcs constraint order haproxy_group then dns_update
</syntaxhighlight>
+
</pre>
  
 
== Testing Procedures ==
 
== Testing Procedures ==
 
=== 1. Manual Failover Test ===
 
=== 1. Manual Failover Test ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs resource move haproxy_group node2.cluster.local
 
sudo pcs resource move haproxy_group node2.cluster.local
 
sudo pcs status | grep -A5 "haproxy_group"
 
sudo pcs status | grep -A5 "haproxy_group"
 
sudo pcs resource clear haproxy_group
 
sudo pcs resource clear haproxy_group
</syntaxhighlight>
+
</pre>
 
 
 
=== 2. Simulate Node Failure ===
 
=== 2. Simulate Node Failure ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo systemctl stop corosync
 
sudo systemctl stop corosync
 
sudo pcs status
 
sudo pcs status
 
watch -n 1 sudo pcs status
 
watch -n 1 sudo pcs status
</syntaxhighlight>
+
</pre>
 
 
 
=== 3. Verify DNS Updates ===
 
=== 3. Verify DNS Updates ===
<syntaxhighlight lang="bash">
+
<pre>
 
dig +short haproxy.example.com
 
dig +short haproxy.example.com
</syntaxhighlight>
+
</pre>
  
 
== Maintenance Operations ==
 
== Maintenance Operations ==
 
=== 1. Node Maintenance ===
 
=== 1. Node Maintenance ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs node standby node3.cluster.local
 
sudo pcs node standby node3.cluster.local
 
sudo pcs node unstandby node3.cluster.local
 
sudo pcs node unstandby node3.cluster.local
</syntaxhighlight>
+
</pre>
 
 
 
=== 2. Cluster Maintenance ===
 
=== 2. Cluster Maintenance ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs cluster standby --all
 
sudo pcs cluster standby --all
 
sudo pcs cluster unstandby --all
 
sudo pcs cluster unstandby --all
</syntaxhighlight>
+
</pre>
 
 
 
=== 3. Adding/Removing Nodes ===
 
=== 3. Adding/Removing Nodes ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs cluster node add node5.cluster.local
 
sudo pcs cluster node add node5.cluster.local
 
sudo pcs cluster node remove node4.cluster.local
 
sudo pcs cluster node remove node4.cluster.local
</syntaxhighlight>
+
</pre>
  
 
== Troubleshooting ==
 
== Troubleshooting ==
 
=== Common Issues ===
 
=== Common Issues ===
 
'''Corosync fails to start'''
 
'''Corosync fails to start'''
<syntaxhighlight lang="bash">
+
<pre>
 
sudo corosync-cfgtool -s
 
sudo corosync-cfgtool -s
 
sudo corosync-cmapctl | grep members
 
sudo corosync-cmapctl | grep members
</syntaxhighlight>
+
</pre>
 
 
 
'''Split-brain scenario'''
 
'''Split-brain scenario'''
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs property set stonith-enabled=true
 
sudo pcs property set stonith-enabled=true
 
sudo pcs stonith create fence_ipmi fence_ipmilan ...
 
sudo pcs stonith create fence_ipmi fence_ipmilan ...
</syntaxhighlight>
+
</pre>
 
 
 
'''Resource failures'''
 
'''Resource failures'''
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs resource debug-start haproxy
 
sudo pcs resource debug-start haproxy
 
sudo pcs resource failcount show
 
sudo pcs resource failcount show
 
sudo pcs resource cleanup haproxy
 
sudo pcs resource cleanup haproxy
</syntaxhighlight>
+
</pre>
  
 
=== Log Locations ===
 
=== Log Locations ===
Line 310: Line 295:
 
== Appendix ==
 
== Appendix ==
 
=== Sample DNS Key File (/etc/bind/cluster.key) ===
 
=== Sample DNS Key File (/etc/bind/cluster.key) ===
 
+
<pre>
<syntaxhighlight lang="conf">
 
 
key "cluster-key" {
 
key "cluster-key" {
 
     algorithm hmac-sha256;
 
     algorithm hmac-sha256;
 
     secret "Base64EncodedKeyHere==";
 
     secret "Base64EncodedKeyHere==";
 
};
 
};
</syntaxhighlight>
+
</pre>
 
 
 
=== Useful Commands ===
 
=== Useful Commands ===
<syntaxhighlight lang="bash">
+
<pre>
 
sudo pcs config show
 
sudo pcs config show
 
sudo pcs stonith show
 
sudo pcs stonith show
 
sudo pcs resource operations haproxy
 
sudo pcs resource operations haproxy
</syntaxhighlight>
+
</pre>
 
 
 
=== References & Further Reading ===
 
=== References & Further Reading ===
 
* [https://clusterlabs.org/pacemaker/doc/ Pacemaker Documentation]
 
* [https://clusterlabs.org/pacemaker/doc/ Pacemaker Documentation]

Latest revision as of 11:38, 22 July 2025

Introduction

This guide documents the setup of a 4-node high availability cluster using:

  • Pacemaker/Corosync for cluster management
  • HAProxy for load balancing
  • Chronyd for time synchronization
  • Automated DNS updates during failover scenarios

Prerequisites

Hardware Requirements

  • 4 identical Debian servers (≥ 2GB RAM, ≥ 2 CPU cores recommended)
  • Network interfaces:
    • Primary: 1Gbps (for client traffic)
    • Secondary: 100Mbps minimum (for heartbeat)

Software Requirements

  • Debian 10/11 (tested on both)
  • Root access to all nodes
  • DNS server supporting dynamic updates

Network Requirements

  • Static IP assignments for all nodes
  • Fully qualified domain names for each node
  • Unrestricted communication on ports:
    • 2224 (pcsd)
    • 5404-5405 (corosync)
    • 80/443 (HAProxy)

Initial Node Setup

1. System Preparation

On all nodes (node1-node4):

sudo apt update && sudo apt upgrade -y

sudo apt install -y \
    pacemaker \
    pcs \
    corosync \
    crmsh \
    haproxy \
    chrony \
    bind9utils \
    mailutils \
    vim

2. Hosts File Configuration

Add to /etc/hosts on every node:

192.168.1.101 node1.cluster.local node1
192.168.1.102 node2.cluster.local node2
192.168.1.103 node3.cluster.local node3
192.168.1.104 node4.cluster.local node4

3. Firewall Configuration

sudo apt install -y ufw
sudo ufw allow from 192.168.1.0/24
sudo ufw enable

Cluster Configuration

1. Initialize Cluster Services

sudo systemctl enable --now pcsd corosync pacemaker

2. Set hacluster Password

echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd

3. Authenticate Nodes (from node1)

sudo pcs cluster auth \
    node1.cluster.local \
    node2.cluster.local \
    node3.cluster.local \
    node4.cluster.local \
    -u hacluster \
    -p SecureP@ssw0rd123 \
    --force

4. Create Cluster

sudo pcs cluster setup \
    --name haproxy_cluster \
    node1.cluster.local \
    node2.cluster.local \
    node3.cluster.local \
    node4.cluster.local \
    --force

5. Start Cluster

sudo pcs cluster start --all
sudo pcs cluster enable --all

6. Verify Status

sudo pcs status

Expected output:

Cluster name: haproxy_cluster
Stack: corosync
Current DC: node1.cluster.local (version x.x.x-x) 
4 nodes configured
0 resources configured

HAProxy Setup

1. Install HAProxy on All Nodes

sudo apt install -y haproxy

2. Configuration Template (/etc/haproxy/haproxy.cfg)

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    maxconn 4000
    tune.ssl.default-dh-param 2048

defaults
    log global
    mode http
    option httplog
    option dontlognull
    timeout connect 5s
    timeout client 50s
    timeout server 50s
    option forwardfor
    option http-server-close

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/private/example.com.pem
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    default_backend http_back

backend http_back
    balance roundrobin
    cookie SERVERID insert indirect nocache
    server webserver1 192.168.1.201:80 check cookie s1
    server webserver2 192.168.1.202:80 check cookie s2

listen stats
    bind *:1936
    stats enable
    stats uri /
    stats hide-version
    stats auth admin:SecureStatsP@ss

3. Add as Cluster Resource

sudo pcs resource create haproxy systemd:haproxy \
    op monitor interval=10s timeout=20s \
    op start interval=0s timeout=30s \
    op stop interval=0s timeout=30s \
    --group haproxy_group

Time Synchronization

1. Configure Chronyd (/etc/chrony/chrony.conf)

pool 0.debian.pool.ntp.org iburst
pool 1.debian.pool.ntp.org iburst
pool 2.debian.pool.ntp.org iburst
pool 3.debian.pool.ntp.org iburst

allow 192.168.1.0/24
leapsectz right/UTC
makestep 1.0 3

2. Verify Time Sync

sudo systemctl restart chronyd
sudo chronyc tracking

Failover Automation

1. DNS Update Script (/usr/local/bin/update_dns.sh)

#!/bin/bash

CLUSTER_NAME="haproxy_cluster"
DNS_RECORD="haproxy.example.com"
DNS_SERVER="ns1.example.com"
DNS_KEY="/etc/bind/cluster.key"
TTL=300

# List your cluster node IPs here for reference (optional, not used in script logic)
NODE_IPS=("192.168.1.101" "192.168.1.102" "192.168.1.103" "192.168.1.104")

# Detect the IP of the current active node (Current DC)
ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}')
ACTIVE_NODE_IP=$(getent hosts $ACTIVE_NODE | awk '{ print $1 }')

logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE ($ACTIVE_NODE_IP)"

nsupdate -k $DNS_KEY <<EOF
server $DNS_SERVER
update delete $DNS_RECORD A
update add $DNS_RECORD $TTL A $ACTIVE_NODE_IP
send
EOF

dig +short $DNS_RECORD @$DNS_SERVER

2. Make Script Executable

sudo chmod 750 /usr/local/bin/update_dns.sh
sudo chown hacluster:haclient /usr/local/bin/update_dns.sh

3. Configure Pacemaker Resource

sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \
    user=hacluster \
    extra_options="-e /usr/local/bin/update_dns.sh" \
    op monitor interval=15s timeout=30s \
    meta target-role=Started

sudo pcs constraint colocation add dns_update with haproxy_group INFINITY
sudo pcs constraint order haproxy_group then dns_update

Testing Procedures

1. Manual Failover Test

sudo pcs resource move haproxy_group node2.cluster.local
sudo pcs status | grep -A5 "haproxy_group"
sudo pcs resource clear haproxy_group

2. Simulate Node Failure

sudo systemctl stop corosync
sudo pcs status
watch -n 1 sudo pcs status

3. Verify DNS Updates

dig +short haproxy.example.com

Maintenance Operations

1. Node Maintenance

sudo pcs node standby node3.cluster.local
sudo pcs node unstandby node3.cluster.local

2. Cluster Maintenance

sudo pcs cluster standby --all
sudo pcs cluster unstandby --all

3. Adding/Removing Nodes

sudo pcs cluster node add node5.cluster.local
sudo pcs cluster node remove node4.cluster.local

Troubleshooting

Common Issues

Corosync fails to start

sudo corosync-cfgtool -s
sudo corosync-cmapctl | grep members

Split-brain scenario

sudo pcs property set stonith-enabled=true
sudo pcs stonith create fence_ipmi fence_ipmilan ...

Resource failures

sudo pcs resource debug-start haproxy
sudo pcs resource failcount show
sudo pcs resource cleanup haproxy

Log Locations

  • Corosync: /var/log/corosync/corosync.log
  • Pacemaker: journalctl -u pacemaker
  • HAProxy: /var/log/haproxy.log

Appendix

Sample DNS Key File (/etc/bind/cluster.key)

key "cluster-key" {
    algorithm hmac-sha256;
    secret "Base64EncodedKeyHere==";
};

Useful Commands

sudo pcs config show
sudo pcs stonith show
sudo pcs resource operations haproxy

References & Further Reading


This comprehensive guide includes: step-by-step instructions, configuration examples, automated failover procedures, maintenance and troubleshooting, and a reference appendix.