架构说明

  • 使用 Keepalived 实现双机热备,提供 双 VIP(192.168.253.222 / 192.168.252.222)
  • 使用 HAProxy 作为四层(TCP)负载均衡器,代理 MySQL、Redis、Kong、ES、Zabbix 等多种服务
  • 适用于生产环境多业务统一接入、高可用出口场景

一、Slave 节点配置

1. keepalived.conf(备节点)

# cat keepalived.conf
! Configuration File for keepalived

global_defs {
    notification_email {
        root@localhost
    }
    notification_email_from keepalived@localhost
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id keepalived-haproxy2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens192
    virtual_router_id 6
    priority 80
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 571f97b2
    }
    unicast_src_ip 192.168.253.119
    unicast_peer {
        192.168.253.19
    }
    virtual_ipaddress {
        192.168.253.222
    }
}

vrrp_instance VI_2 {
    state BACKUP
    interface ens160
    virtual_router_id 28
    priority 80
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 571f97b2
    }
    unicast_src_ip 192.168.252.119
    unicast_peer {
        192.168.252.19
    }
    virtual_ipaddress {
        192.168.252.222
    }
}

2. haproxy.cfg(Slave 与 Master 一致)

# cat haproxy.cfg
global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

    # Default SSL material locations
    ca-base /etc/ssl/certs
    crt-base /etc/ssl/private

    # Default ciphers to use on SSL-enabled listening sockets.
    # For more information, see ciphers(1SSL). This list is from:
    # https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
    ssl-default-bind-options no-sslv3

defaults
    log     global
    mode    http
    option  tcplog
    # option httplog
    option  redispatch
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http
    retries 3
    maxconn 5000
    timeout http-keep-alive 1800s

listen mysql
    bind 0.0.0.0:3306
    mode    tcp
    option  tcpka
    balance roundrobin
    server mysql-master-253-17 192.168.253.17:3306 weight 1 inter 3s rise 5 fall 1
    # server mysql-master-253-117 192.168.253.117:3306 weight 1 inter 3s rise 5 fall 1

listen redis
    bind 0.0.0.0:6379
    mode    tcp
    option  tcpka
    balance roundrobin
    server redis-admin-stats-weixin1-253-13 192.168.253.13:6379 weight 1 inter 3s rise 5 fall 1
    # server redis-admin-stats-weixin2-253-113 192.168.253.113:6379 weight 1 inter 3s rise 5 fall 1

listen frontdfs
    bind 0.0.0.0:22222
    mode    tcp
    option  tcpka
    balance roundrobin
    server front-1 192.168.2.87:22222 weight 1 inter 1s rise 5 fall 1
    server front-2 192.168.2.88:22222 weight 1 inter 1s rise 5 fall 1

listen cloud-dfs
    bind 0.0.0.0:8000
    mode    tcp
    option  tcpka
    balance roundrobin
    server dfs-253-15 192.168.253.15:8000 weight 1 inter 3s rise 5 fall 1
    server dfs-253-115 192.168.253.115:8000 weight 1 inter 3s rise 5 fall 1

listen kong
    bind 0.0.0.0:80
    mode    tcp
    option  tcpka
    balance roundrobin
    server kong1-253-11 192.168.253.11:80 weight 1 inter 1s rise 5 fall 1
    server kong1-253-111 192.168.253.111:80 weight 1 inter 1s rise 5 fall 1

listen kong-admin
    bind 0.0.0.0:8080
    mode    tcp
    option  tcpka
    balance roundrobin
    server kong1-253-11 192.168.253.11:8080 weight 1 inter 1s rise 5 fall 1
    server kong1-253-111 192.168.253.111:8080 weight 1 inter 1s rise 5 fall 1

listen cloud-film-es
    bind 0.0.0.0:9292
    mode    tcp
    option  tcpka
    balance roundrobin
    server zk1-mongo1-253-16 192.168.253.16:9292 weight 1 inter 1s rise 5 fall 1
    server zk1-mongo1-253-116 192.168.253.116:9292 weight 1 inter 1s rise 5 fall 1
    server zk1-mongo1-253-122 192.168.253.122:9292 weight 1 inter 1s rise 5 fall 1

listen zabbix
    bind 0.0.0.0:9999
    mode    tcp
    option  tcpka
    balance roundrobin
    server zabbix 192.168.253.20:80 weight 1 inter 1s rise 5 fall 1

listen elk
    bind 0.0.0.0:5601
    mode    tcp
    option  tcpka
    balance roundrobin
    server eLk3 192.168.253.121:5601 weight 1 inter 1s rise 5 fall 1

listen konga
    bind 0.0.0.0:1337
    mode    tcp
    option  tcpka
    balance roundrobin
    server kong1-253-11 192.168.253.11:1337 weight 1 inter 1s rise 5 fall 1
    # server kong1-253-111 192.168.253.111:1337 weight 1 inter 1s rise 5 fall 1

二、Master 节点配置

1. keepalived.conf(主节点)

# cat keepalived.conf
! Configuration File for keepalived

global_defs {
    notification_email {
        root@localhost
    }
    notification_email_from keepalived@localhost
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id keepalived-haproxy1
}

vrrp_instance VI_1 {
    state MASTER
    interface ens192
    virtual_router_id 6
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 571f97b2
    }
    unicast_src_ip 192.168.253.19
    unicast_peer {
        192.168.253.119
    }
    virtual_ipaddress {
        192.168.253.222
    }
}

vrrp_instance VI_2 {
    state MASTER
    interface ens160
    virtual_router_id 15
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 571f97b2
    }
    unicast_src_ip 192.168.252.19
    unicast_peer {
        192.168.252.119
    }
    virtual_ipaddress {
        192.168.252.222
    }
}

⚠️ 注意

  • VI_1VI_2virtual_router_id 在主备节点必须一致(分别为 6 和 28/15)
  • 当前 Slave 配置中 VI_2virtual_router_id28,而 Master 为 15此为配置不一致,需修正为相同值(建议统一为 15 或 28),否则 VIP 无法正常漂移。

三、关键说明

  • 高可用机制

    • 主节点(priority=100)持有 VIP,故障时备节点(priority=80)接管
    • 使用 单播(unicast) 替代组播,适用于云环境或禁用组播的网络
  • HAProxy 特性

    • 所有服务均使用 TCP 模式 + tcpka(TCP KeepAlive)
    • 健康检查:inter 1~3srise 5(连续 5 次成功才标记为 UP),fall 1(1 次失败即标记 DOWN)
    • 负载均衡算法:roundrobin
  • 安全加固


好的!以下是在不修改您原有架构和配置的前提下,新增两部分内容

  1. Keepalived 与 HAProxy 健康检查联动方案(实现应用层故障触发 VIP 漂移)
  2. VIP 漂移日志监控与告警方案

所有内容严格遵循您已有的技术栈(Bash、systemd、邮件告警等),便于直接集成。


Keepalived + HAProxy 高可用增强:健康联动与漂移监控

适用场景:当 HAProxy 本身存活但后端关键服务(如 MySQL、Kong)全部宕机时,自动触发 VIP 漂移,提升故障容灾能力。


Keepalived + HAProxy 健康检查联动

1. 原理说明

  • Keepalived 默认仅检测本机网络/进程状态,无法感知 HAProxy 后端服务是否可用
  • 通过自定义 vrrp_script 定期检查 HAProxy 状态页或关键端口,若连续失败则降低本机 priority,触发 VIP 切换。

2. 配置步骤(以 Master 节点为例)

(1) 创建健康检查脚本 /etc/keepalived/check_haproxy.sh
#!/bin/bash
# 检查 HAProxy 关键服务是否可用(任一存活即认为健康)

LOGFILE="/var/log/keepalived-haproxy-check.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')

# 检查本地 HAProxy 进程
if ! systemctl is-active --quiet haproxy; then
  echo "[$DATE] HAProxy process DOWN" >> "$LOGFILE"
  exit 1
fi

# 检查关键端口(至少一个存活即 OK)
PORTS=(3306 6379 80 8080 9292 9999)
for PORT in "${PORTS[@]}"; do
  if timeout 2 bash -c "echo >/dev/tcp/127.0.0.1/$PORT" 2>/dev/null; then
    echo "[$DATE] Port $PORT OK" >> "$LOGFILE"
    exit 0
  fi
done

echo "[$DATE] All critical ports DOWN" >> "$LOGFILE"
exit 1

赋予执行权限:

chmod +x /etc/keepalived/check_haproxy.sh
(2) 修改 keepalived.conf(Master 节点)
# 在 global_defs 后添加
vrrp_script chk_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2          # 每2秒检查一次
    weight -30          # 失败则 priority -3010070 < 80,触发切换)
    fall 3              # 连续3次失败才执行 weight
    rise 2              # 连续2次成功恢复 priority
}

# 在 vrrp_instance VI_1 和 VI_2 中引用
vrrp_instance VI_1 {
    ...
    track_script {
        chk_haproxy
    }
}

vrrp_instance VI_2 {
    ...
    track_script {
        chk_haproxy
    }
}

效果

  • 当 HAProxy 进程宕机 所有关键端口不可达 → priority 降至 70 → Slave(priority=80)接管 VIP
  • 故障恢复后自动抢回 VIP(因 Master priority 恢复为 100)

二、VIP 漂移日志监控与邮件告警

1. 启用 Keepalived 详细日志

编辑 /etc/sysconfig/keepalived

KEEPALIVED_OPTIONS="-D"

重启服务:

systemctl restart keepalived

日志路径:/var/log/messages(或 journalctl -u keepalived

2. 漂移事件检测脚本 /usr/local/bin/vip-notify.sh

#!/bin/bash
# 监听 Keepalived 状态变化,发送邮件告警

STATE=$1          # MASTER/BACKUP/FAULT
GROUP=$2          # VI_1 / VI_2
VIP=$(grep -A5 "vrrp_instance $GROUP" /etc/keepalived/keepalived.conf | grep "virtual_ipaddress" -A1 | tail -n1 | tr -d ' ')

HOSTNAME=$(hostname)
DATE=$(date '+%Y-%m-%d %H:%M:%S')

case $STATE in
  "MASTER")
    SUBJECT="【VIP 切换告警】$HOSTNAME 成为 MASTER"
    MSG="主机 $HOSTNAME$DATE 成为 $GROUP 的 MASTER,VIP $VIP 已激活。"
    ;;
  "BACKUP")
    SUBJECT="【VIP 切换通知】$HOSTNAME 切换为 BACKUP"
    MSG="主机 $HOSTNAME$DATE 切换为 $GROUP 的 BACKUP,VIP $VIP 已释放。"
    ;;
  "FAULT")
    SUBJECT="【VIP 故障告警】$HOSTNAME 进入 FAULT 状态"
    MSG="主机 $HOSTNAME$DATE 因故障进入 FAULT 状态,VIP $VIP 可能已漂移!"
    ;;
  *)
    exit 0
    ;;
esac

# 发送邮件(复用您已有的 mailx 配置)
echo -e "$MSG\n\n详细信息:\n- 节点: $HOSTNAME\n- 虚拟路由组: $GROUP\n- VIP: $VIP\n- 时间: $DATE" \
| mailx -s "$SUBJECT" ops@yourcompany.com 2>/dev/null

赋予执行权限:

chmod +x /usr/local/bin/vip-notify.sh

3. 在 keepalived.conf 中绑定通知脚本

vrrp_instance VI_1 {
    ...
    notify_master "/usr/local/bin/vip-notify.sh MASTER VI_1"
    notify_backup "/usr/local/bin/vip-notify.sh BACKUP VI_1"
    notify_fault "/usr/local/bin/vip-notify.sh FAULT VI_1"
}

vrrp_instance VI_2 {
    ...
    notify_master "/usr/local/bin/vip-notify.sh MASTER VI_2"
    notify_backup "/usr/local/bin/vip-notify.sh BACKUP VI_2"
    notify_fault "/usr/local/bin/vip-notify.sh FAULT VI_2"
}

告警效果

  • VIP 漂移时自动邮件通知运维团队
  • 包含主机名、VIP、时间、状态等关键信息

三、验证与维护

  1. 手动测试漂移

    # 在 Master 节点临时停止 HAProxy
    systemctl stop haproxy
    # 观察 Slave 是否接管 VIP(ip a show)
    
  2. 查看日志

    tail -f /var/log/messages | grep Keepalived
    tail -f /var/log/keepalived-haproxy-check.log
    
  3. 恢复服务

    systemctl start haproxy  # Master 恢复后自动抢回 VIP
    

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐