Difference between revisions of "High Availability Cluster with Pacemaker, Chronyd"
Jump to navigation
Jump to search
(Created page with "= High Availability Cluster with Pacemaker, Chronyd, and HAProxy = '''A Comprehensive Guide for Debian Systems''' == Introduction == This guide documents the setup of a 4-no...") |
(→1. DNS Update Script (/usr/local/bin/update_dns.sh)) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
− | |||
− | |||
== Introduction == | == Introduction == | ||
This guide documents the setup of a 4-node high availability cluster using: | This guide documents the setup of a 4-node high availability cluster using: | ||
Line 34: | Line 30: | ||
On all nodes (node1-node4): | On all nodes (node1-node4): | ||
− | < | + | <pre> |
− | |||
sudo apt update && sudo apt upgrade -y | sudo apt update && sudo apt upgrade -y | ||
− | |||
sudo apt install -y \ | sudo apt install -y \ | ||
pacemaker \ | pacemaker \ | ||
Line 49: | Line 43: | ||
mailutils \ | mailutils \ | ||
vim | vim | ||
− | </ | + | </pre> |
=== 2. Hosts File Configuration === | === 2. Hosts File Configuration === | ||
− | Add to | + | Add to <code>/etc/hosts</code> on every node: |
<pre> | <pre> | ||
192.168.1.101 node1.cluster.local node1 | 192.168.1.101 node1.cluster.local node1 | ||
Line 61: | Line 55: | ||
=== 3. Firewall Configuration === | === 3. Firewall Configuration === | ||
− | < | + | <pre> |
sudo apt install -y ufw | sudo apt install -y ufw | ||
sudo ufw allow from 192.168.1.0/24 | sudo ufw allow from 192.168.1.0/24 | ||
sudo ufw enable | sudo ufw enable | ||
− | </ | + | </pre> |
== Cluster Configuration == | == Cluster Configuration == | ||
=== 1. Initialize Cluster Services === | === 1. Initialize Cluster Services === | ||
− | < | + | <pre> |
sudo systemctl enable --now pcsd corosync pacemaker | sudo systemctl enable --now pcsd corosync pacemaker | ||
− | </ | + | </pre> |
− | |||
=== 2. Set hacluster Password === | === 2. Set hacluster Password === | ||
− | < | + | <pre> |
echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd | echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd | ||
− | </ | + | </pre> |
− | |||
=== 3. Authenticate Nodes (from node1) === | === 3. Authenticate Nodes (from node1) === | ||
− | < | + | <pre> |
sudo pcs cluster auth \ | sudo pcs cluster auth \ | ||
node1.cluster.local \ | node1.cluster.local \ | ||
Line 88: | Line 80: | ||
-p SecureP@ssw0rd123 \ | -p SecureP@ssw0rd123 \ | ||
--force | --force | ||
− | </ | + | </pre> |
− | |||
=== 4. Create Cluster === | === 4. Create Cluster === | ||
− | < | + | <pre> |
sudo pcs cluster setup \ | sudo pcs cluster setup \ | ||
--name haproxy_cluster \ | --name haproxy_cluster \ | ||
Line 99: | Line 90: | ||
node4.cluster.local \ | node4.cluster.local \ | ||
--force | --force | ||
− | </ | + | </pre> |
− | |||
=== 5. Start Cluster === | === 5. Start Cluster === | ||
− | < | + | <pre> |
sudo pcs cluster start --all | sudo pcs cluster start --all | ||
sudo pcs cluster enable --all | sudo pcs cluster enable --all | ||
− | </ | + | </pre> |
− | |||
=== 6. Verify Status === | === 6. Verify Status === | ||
− | < | + | <pre> |
sudo pcs status | sudo pcs status | ||
− | </ | + | </pre> |
Expected output: | Expected output: | ||
<pre> | <pre> | ||
Line 122: | Line 111: | ||
== HAProxy Setup == | == HAProxy Setup == | ||
=== 1. Install HAProxy on All Nodes === | === 1. Install HAProxy on All Nodes === | ||
− | < | + | <pre> |
sudo apt install -y haproxy | sudo apt install -y haproxy | ||
− | </ | + | </pre> |
− | |||
=== 2. Configuration Template (/etc/haproxy/haproxy.cfg) === | === 2. Configuration Template (/etc/haproxy/haproxy.cfg) === | ||
− | < | + | <pre> |
global | global | ||
log /dev/log local0 | log /dev/log local0 | ||
Line 170: | Line 158: | ||
stats hide-version | stats hide-version | ||
stats auth admin:SecureStatsP@ss | stats auth admin:SecureStatsP@ss | ||
− | </ | + | </pre> |
=== 3. Add as Cluster Resource === | === 3. Add as Cluster Resource === | ||
− | < | + | <pre> |
sudo pcs resource create haproxy systemd:haproxy \ | sudo pcs resource create haproxy systemd:haproxy \ | ||
op monitor interval=10s timeout=20s \ | op monitor interval=10s timeout=20s \ | ||
Line 179: | Line 167: | ||
op stop interval=0s timeout=30s \ | op stop interval=0s timeout=30s \ | ||
--group haproxy_group | --group haproxy_group | ||
− | </ | + | </pre> |
== Time Synchronization == | == Time Synchronization == | ||
=== 1. Configure Chronyd (/etc/chrony/chrony.conf) === | === 1. Configure Chronyd (/etc/chrony/chrony.conf) === | ||
− | < | + | <pre> |
pool 0.debian.pool.ntp.org iburst | pool 0.debian.pool.ntp.org iburst | ||
pool 1.debian.pool.ntp.org iburst | pool 1.debian.pool.ntp.org iburst | ||
Line 192: | Line 180: | ||
leapsectz right/UTC | leapsectz right/UTC | ||
makestep 1.0 3 | makestep 1.0 3 | ||
− | </ | + | </pre> |
− | |||
=== 2. Verify Time Sync === | === 2. Verify Time Sync === | ||
− | < | + | <pre> |
sudo systemctl restart chronyd | sudo systemctl restart chronyd | ||
sudo chronyc tracking | sudo chronyc tracking | ||
− | </ | + | </pre> |
== Failover Automation == | == Failover Automation == | ||
=== 1. DNS Update Script (/usr/local/bin/update_dns.sh) === | === 1. DNS Update Script (/usr/local/bin/update_dns.sh) === | ||
− | < | + | <pre> |
#!/bin/bash | #!/bin/bash | ||
CLUSTER_NAME="haproxy_cluster" | CLUSTER_NAME="haproxy_cluster" | ||
− | |||
DNS_RECORD="haproxy.example.com" | DNS_RECORD="haproxy.example.com" | ||
DNS_SERVER="ns1.example.com" | DNS_SERVER="ns1.example.com" | ||
Line 212: | Line 198: | ||
TTL=300 | TTL=300 | ||
− | ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}' | + | # List your cluster node IPs here for reference (optional, not used in script logic) |
− | logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE" | + | NODE_IPS=("192.168.1.101" "192.168.1.102" "192.168.1.103" "192.168.1.104") |
+ | |||
+ | # Detect the IP of the current active node (Current DC) | ||
+ | ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}') | ||
+ | ACTIVE_NODE_IP=$(getent hosts $ACTIVE_NODE | awk '{ print $1 }') | ||
+ | |||
+ | logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE ($ACTIVE_NODE_IP)" | ||
nsupdate -k $DNS_KEY <<EOF | nsupdate -k $DNS_KEY <<EOF | ||
server $DNS_SERVER | server $DNS_SERVER | ||
update delete $DNS_RECORD A | update delete $DNS_RECORD A | ||
− | update add $DNS_RECORD $TTL A $ | + | update add $DNS_RECORD $TTL A $ACTIVE_NODE_IP |
send | send | ||
EOF | EOF | ||
dig +short $DNS_RECORD @$DNS_SERVER | dig +short $DNS_RECORD @$DNS_SERVER | ||
− | </ | + | </pre> |
=== 2. Make Script Executable === | === 2. Make Script Executable === | ||
− | < | + | <pre> |
sudo chmod 750 /usr/local/bin/update_dns.sh | sudo chmod 750 /usr/local/bin/update_dns.sh | ||
sudo chown hacluster:haclient /usr/local/bin/update_dns.sh | sudo chown hacluster:haclient /usr/local/bin/update_dns.sh | ||
− | </ | + | </pre> |
− | |||
=== 3. Configure Pacemaker Resource === | === 3. Configure Pacemaker Resource === | ||
− | < | + | <pre> |
sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \ | sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \ | ||
user=hacluster \ | user=hacluster \ | ||
Line 241: | Line 232: | ||
sudo pcs constraint colocation add dns_update with haproxy_group INFINITY | sudo pcs constraint colocation add dns_update with haproxy_group INFINITY | ||
sudo pcs constraint order haproxy_group then dns_update | sudo pcs constraint order haproxy_group then dns_update | ||
− | </ | + | </pre> |
== Testing Procedures == | == Testing Procedures == | ||
=== 1. Manual Failover Test === | === 1. Manual Failover Test === | ||
− | < | + | <pre> |
sudo pcs resource move haproxy_group node2.cluster.local | sudo pcs resource move haproxy_group node2.cluster.local | ||
sudo pcs status | grep -A5 "haproxy_group" | sudo pcs status | grep -A5 "haproxy_group" | ||
sudo pcs resource clear haproxy_group | sudo pcs resource clear haproxy_group | ||
− | </ | + | </pre> |
− | |||
=== 2. Simulate Node Failure === | === 2. Simulate Node Failure === | ||
− | < | + | <pre> |
sudo systemctl stop corosync | sudo systemctl stop corosync | ||
sudo pcs status | sudo pcs status | ||
watch -n 1 sudo pcs status | watch -n 1 sudo pcs status | ||
− | </ | + | </pre> |
− | |||
=== 3. Verify DNS Updates === | === 3. Verify DNS Updates === | ||
− | < | + | <pre> |
dig +short haproxy.example.com | dig +short haproxy.example.com | ||
− | </ | + | </pre> |
== Maintenance Operations == | == Maintenance Operations == | ||
=== 1. Node Maintenance === | === 1. Node Maintenance === | ||
− | < | + | <pre> |
sudo pcs node standby node3.cluster.local | sudo pcs node standby node3.cluster.local | ||
sudo pcs node unstandby node3.cluster.local | sudo pcs node unstandby node3.cluster.local | ||
− | </ | + | </pre> |
− | |||
=== 2. Cluster Maintenance === | === 2. Cluster Maintenance === | ||
− | < | + | <pre> |
sudo pcs cluster standby --all | sudo pcs cluster standby --all | ||
sudo pcs cluster unstandby --all | sudo pcs cluster unstandby --all | ||
− | </ | + | </pre> |
− | |||
=== 3. Adding/Removing Nodes === | === 3. Adding/Removing Nodes === | ||
− | < | + | <pre> |
sudo pcs cluster node add node5.cluster.local | sudo pcs cluster node add node5.cluster.local | ||
sudo pcs cluster node remove node4.cluster.local | sudo pcs cluster node remove node4.cluster.local | ||
− | </ | + | </pre> |
== Troubleshooting == | == Troubleshooting == | ||
=== Common Issues === | === Common Issues === | ||
'''Corosync fails to start''' | '''Corosync fails to start''' | ||
− | < | + | <pre> |
sudo corosync-cfgtool -s | sudo corosync-cfgtool -s | ||
sudo corosync-cmapctl | grep members | sudo corosync-cmapctl | grep members | ||
− | </ | + | </pre> |
− | |||
'''Split-brain scenario''' | '''Split-brain scenario''' | ||
− | < | + | <pre> |
sudo pcs property set stonith-enabled=true | sudo pcs property set stonith-enabled=true | ||
sudo pcs stonith create fence_ipmi fence_ipmilan ... | sudo pcs stonith create fence_ipmi fence_ipmilan ... | ||
− | </ | + | </pre> |
− | |||
'''Resource failures''' | '''Resource failures''' | ||
− | < | + | <pre> |
sudo pcs resource debug-start haproxy | sudo pcs resource debug-start haproxy | ||
sudo pcs resource failcount show | sudo pcs resource failcount show | ||
sudo pcs resource cleanup haproxy | sudo pcs resource cleanup haproxy | ||
− | </ | + | </pre> |
=== Log Locations === | === Log Locations === | ||
Line 310: | Line 295: | ||
== Appendix == | == Appendix == | ||
=== Sample DNS Key File (/etc/bind/cluster.key) === | === Sample DNS Key File (/etc/bind/cluster.key) === | ||
− | < | + | <pre> |
key "cluster-key" { | key "cluster-key" { | ||
algorithm hmac-sha256; | algorithm hmac-sha256; | ||
secret "Base64EncodedKeyHere=="; | secret "Base64EncodedKeyHere=="; | ||
}; | }; | ||
− | </ | + | </pre> |
− | |||
=== Useful Commands === | === Useful Commands === | ||
− | < | + | <pre> |
sudo pcs config show | sudo pcs config show | ||
sudo pcs stonith show | sudo pcs stonith show | ||
sudo pcs resource operations haproxy | sudo pcs resource operations haproxy | ||
− | </ | + | </pre> |
− | |||
=== References & Further Reading === | === References & Further Reading === | ||
* [https://clusterlabs.org/pacemaker/doc/ Pacemaker Documentation] | * [https://clusterlabs.org/pacemaker/doc/ Pacemaker Documentation] |
Latest revision as of 11:38, 22 July 2025
Contents
Introduction
This guide documents the setup of a 4-node high availability cluster using:
- Pacemaker/Corosync for cluster management
- HAProxy for load balancing
- Chronyd for time synchronization
- Automated DNS updates during failover scenarios
Prerequisites
Hardware Requirements
- 4 identical Debian servers (≥ 2GB RAM, ≥ 2 CPU cores recommended)
- Network interfaces:
- Primary: 1Gbps (for client traffic)
- Secondary: 100Mbps minimum (for heartbeat)
Software Requirements
- Debian 10/11 (tested on both)
- Root access to all nodes
- DNS server supporting dynamic updates
Network Requirements
- Static IP assignments for all nodes
- Fully qualified domain names for each node
- Unrestricted communication on ports:
- 2224 (pcsd)
- 5404-5405 (corosync)
- 80/443 (HAProxy)
Initial Node Setup
1. System Preparation
On all nodes (node1-node4):
sudo apt update && sudo apt upgrade -y sudo apt install -y \ pacemaker \ pcs \ corosync \ crmsh \ haproxy \ chrony \ bind9utils \ mailutils \ vim
2. Hosts File Configuration
Add to /etc/hosts
on every node:
192.168.1.101 node1.cluster.local node1 192.168.1.102 node2.cluster.local node2 192.168.1.103 node3.cluster.local node3 192.168.1.104 node4.cluster.local node4
3. Firewall Configuration
sudo apt install -y ufw sudo ufw allow from 192.168.1.0/24 sudo ufw enable
Cluster Configuration
1. Initialize Cluster Services
sudo systemctl enable --now pcsd corosync pacemaker
2. Set hacluster Password
echo "hacluster:SecureP@ssw0rd123" | sudo chpasswd
3. Authenticate Nodes (from node1)
sudo pcs cluster auth \ node1.cluster.local \ node2.cluster.local \ node3.cluster.local \ node4.cluster.local \ -u hacluster \ -p SecureP@ssw0rd123 \ --force
4. Create Cluster
sudo pcs cluster setup \ --name haproxy_cluster \ node1.cluster.local \ node2.cluster.local \ node3.cluster.local \ node4.cluster.local \ --force
5. Start Cluster
sudo pcs cluster start --all sudo pcs cluster enable --all
6. Verify Status
sudo pcs status
Expected output:
Cluster name: haproxy_cluster Stack: corosync Current DC: node1.cluster.local (version x.x.x-x) 4 nodes configured 0 resources configured
HAProxy Setup
1. Install HAProxy on All Nodes
sudo apt install -y haproxy
2. Configuration Template (/etc/haproxy/haproxy.cfg)
global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin stats timeout 30s user haproxy group haproxy daemon maxconn 4000 tune.ssl.default-dh-param 2048 defaults log global mode http option httplog option dontlognull timeout connect 5s timeout client 50s timeout server 50s option forwardfor option http-server-close frontend http_front bind *:80 bind *:443 ssl crt /etc/ssl/private/example.com.pem http-request set-header X-Forwarded-Port %[dst_port] http-request add-header X-Forwarded-Proto https if { ssl_fc } default_backend http_back backend http_back balance roundrobin cookie SERVERID insert indirect nocache server webserver1 192.168.1.201:80 check cookie s1 server webserver2 192.168.1.202:80 check cookie s2 listen stats bind *:1936 stats enable stats uri / stats hide-version stats auth admin:SecureStatsP@ss
3. Add as Cluster Resource
sudo pcs resource create haproxy systemd:haproxy \ op monitor interval=10s timeout=20s \ op start interval=0s timeout=30s \ op stop interval=0s timeout=30s \ --group haproxy_group
Time Synchronization
1. Configure Chronyd (/etc/chrony/chrony.conf)
pool 0.debian.pool.ntp.org iburst pool 1.debian.pool.ntp.org iburst pool 2.debian.pool.ntp.org iburst pool 3.debian.pool.ntp.org iburst allow 192.168.1.0/24 leapsectz right/UTC makestep 1.0 3
2. Verify Time Sync
sudo systemctl restart chronyd sudo chronyc tracking
Failover Automation
1. DNS Update Script (/usr/local/bin/update_dns.sh)
#!/bin/bash CLUSTER_NAME="haproxy_cluster" DNS_RECORD="haproxy.example.com" DNS_SERVER="ns1.example.com" DNS_KEY="/etc/bind/cluster.key" TTL=300 # List your cluster node IPs here for reference (optional, not used in script logic) NODE_IPS=("192.168.1.101" "192.168.1.102" "192.168.1.103" "192.168.1.104") # Detect the IP of the current active node (Current DC) ACTIVE_NODE=$(sudo pcs status | grep "Current DC:" | awk '{print $4}') ACTIVE_NODE_IP=$(getent hosts $ACTIVE_NODE | awk '{ print $1 }') logger -t haproxy-cluster "Failover detected. Active node: $ACTIVE_NODE ($ACTIVE_NODE_IP)" nsupdate -k $DNS_KEY <<EOF server $DNS_SERVER update delete $DNS_RECORD A update add $DNS_RECORD $TTL A $ACTIVE_NODE_IP send EOF dig +short $DNS_RECORD @$DNS_SERVER
2. Make Script Executable
sudo chmod 750 /usr/local/bin/update_dns.sh sudo chown hacluster:haclient /usr/local/bin/update_dns.sh
3. Configure Pacemaker Resource
sudo pcs resource create dns_update ocf:pacemaker:ClusterMon \ user=hacluster \ extra_options="-e /usr/local/bin/update_dns.sh" \ op monitor interval=15s timeout=30s \ meta target-role=Started sudo pcs constraint colocation add dns_update with haproxy_group INFINITY sudo pcs constraint order haproxy_group then dns_update
Testing Procedures
1. Manual Failover Test
sudo pcs resource move haproxy_group node2.cluster.local sudo pcs status | grep -A5 "haproxy_group" sudo pcs resource clear haproxy_group
2. Simulate Node Failure
sudo systemctl stop corosync sudo pcs status watch -n 1 sudo pcs status
3. Verify DNS Updates
dig +short haproxy.example.com
Maintenance Operations
1. Node Maintenance
sudo pcs node standby node3.cluster.local sudo pcs node unstandby node3.cluster.local
2. Cluster Maintenance
sudo pcs cluster standby --all sudo pcs cluster unstandby --all
3. Adding/Removing Nodes
sudo pcs cluster node add node5.cluster.local sudo pcs cluster node remove node4.cluster.local
Troubleshooting
Common Issues
Corosync fails to start
sudo corosync-cfgtool -s sudo corosync-cmapctl | grep members
Split-brain scenario
sudo pcs property set stonith-enabled=true sudo pcs stonith create fence_ipmi fence_ipmilan ...
Resource failures
sudo pcs resource debug-start haproxy sudo pcs resource failcount show sudo pcs resource cleanup haproxy
Log Locations
- Corosync: /var/log/corosync/corosync.log
- Pacemaker: journalctl -u pacemaker
- HAProxy: /var/log/haproxy.log
Appendix
Sample DNS Key File (/etc/bind/cluster.key)
key "cluster-key" { algorithm hmac-sha256; secret "Base64EncodedKeyHere=="; };
Useful Commands
sudo pcs config show sudo pcs stonith show sudo pcs resource operations haproxy
References & Further Reading
- Pacemaker Documentation
- HAProxy Configuration Manual
- Debian Cluster Suite
- Corosync Documentation
- Chrony Documentation
This comprehensive guide includes: step-by-step instructions, configuration examples, automated failover procedures, maintenance and troubleshooting, and a reference appendix.