Keepalived配置

参考:接上章kp原理官网下载keepalived 配置文件参数详解, mysql 临时, 配置文件详解, 健康检查, lvs+keepalived+nginx, Keepalived实现LVS(DR模式)

一、环境说明

  • 机器准备

    节点角色 IP地址 操作系统 核心服务 优先级配置
    主节点(Master) 192.168.189.128 CentOS 7 Keepalived + tomcat 150(高于备份节点)
    备份节点(Backup) 192.168.189.135 CentOS 7 Keepalived + tomcat 100(低于主节点)
    虚拟节点(VIP) 192.168.189.110 - 对外提供服务的统一IP -
  • yum安装

    # 两台节点均执行以下命令安装
    yum -y install keepalived libnl3-devel ipset-devel -v
    # 启动并设为开机自启
    systemctl enable --now keepalived
    # 检查状态
    systemctl status keepalived
    

二、配置示例

  • 主节点

    ! Configuration File for keepalived
    # 全局配置
    global_defs {
        router_id LVS_MASTER
        smtp_server 192.168.1.1
        notification_email {
            sysadmin@example.com
        }
    }
    
    # 健康检查脚本 (已修正名称匹配)
    vrrp_script check_tomcat {
        script "/etc/keepalived/check_tomcat.sh"
        interval 2       # 每隔2秒检测一次
        weight -60       # 若失败降低优先级60
        fall 3           # 连续3次失败判定宕机
        rise 2           # 连续2次成功恢复服务
    }
    
    # VRRP实例
    vrrp_instance VI_1 {
        state BACKUP
        interface ens33
        virtual_router_id 51
        priority 150     # 主节点优先级高于备份节点
        advert_int 1     # 广播间隔设为1秒
        unicast_src_ip 192.168.189.135   # 本节点IP地址
        unicast_peer {
            192.168.189.128              # 备份节点IP地址
        }
    
        authentication {
            auth_type PASS
            auth_pass 11112222           # 推荐更换为复杂密码
        }
    
        virtual_ipaddress {
            192.168.189.110/24          # 浮动VIP地址
        }
    
        track_script {
            check_tomcat                 # 绑定健康检查脚本
        }
    
        # 可选: 添加状态切换通知脚本
        # notify_master /opt/master.sh
        # notify_backup /opt/backup.sh
    }
    
    # 虚拟服务(负载均衡)
    virtual_server 192.168.189.110 8090 {
        delay_loop 6                    # RS轮询探测间隔
        lb_algo rr                      # 负载算法:轮询
        lb_kind DR                      # 直接路由模式
        protocol TCP
    
        real_server 192.168.189.128 8090 {
            weight 1
            HTTP_GET {
                url {
                    path /
                }
                connect_timeout 3       # 设置连接超时时间
                nb_get_retry 3          # 最多重试次数
                delay_before_retry 3    # 重试前等待时间
            }
        }
    
        real_server 192.168.189.135 8090 {
            weight 1
            HTTP_GET {
                url {
                    path /
                }
                connect_timeout 3
                nb_get_retry 3
                delay_before_retry 3
            }
        }
    }
    
    • /etc/keepalived/check_tomcat.sh

      #!/bin/bash
      # 用ss命令检测8090端口是否监听(ss是系统自带,替代nc)
      ss -tuln | grep -q ":8090 "  # -q静默模式,仅返回退出码
      if [ $? -ne 0 ]; then
          # 端口未监听,返回1(触发优先级降低)
          exit 1
      else
          # 端口正常,返回0
          exit 0
      fi
      
    • 权限设置

      chmod 777 /etc/keepalived/check_tomcat.sh
      chmod 644 /etc/keepalived/keepalived.conf
      
  • 备节点

    # 调一下地址
        state BACKUP    # <-- 两边都是backup让它们自己去协商,都是master会直接脑裂
    	unicast_src_ip 192.168.189.128   # 本节点IP地址
        unicast_peer {
            192.168.189.135              # 备份节点IP地址
        }
    

2.1、检查

  • 启动kp日志

    Nov 23 21:47:08 -- systemd: Started LVS and VRRP High Availability Monitor.
    Nov 23 21:47:08 -- Keepalived[32967]: Starting VRRP child process, pid=32969
    Nov 23 21:47:08 -- Keepalived_healthcheckers[32968]: Opening file '/etc/keepalived/keepalived.conf'.
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Registering Kernel netlink reflector
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Registering Kernel netlink command channel
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Registering gratuitous ARP shared channel
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Opening file '/etc/keepalived/keepalived.conf'.
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Unsafe permissions found for script '/etc/keepalived/check_tomcat.sh'.
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: SECURITY VIOLATION - scripts are being executed but script_security not enabled. There are insecure scripts.
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) removing protocol VIPs.
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Using LinkWatch kernel netlink reflector...
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Entering BACKUP STATE
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)]
    Nov 23 21:47:08 -- Keepalived_healthcheckers[32968]: Activating healthchecker for service [192.168.189.110]:8090
    Nov 23 21:47:08 -- Keepalived_healthcheckers[32968]: Activating healthchecker for service [192.168.189.110]:8090
    Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP_Script(check_tomcat) succeeded
    Nov 23 21:47:11 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Transition to MASTER STATE
    Nov 23 21:47:12 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Entering MASTER STATE
    Nov 23 21:47:12 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) setting protocol VIPs.
    Nov 23 21:47:12 -- Keepalived_vrrp[32969]: Sending gratuitous ARP on ens33 for 192.168.189.110
    Nov 23 21:47:12 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.189.110
    
  • 节点切换时日志

    • 主切到从

      Nov 23 21:49:56 --- systemd: Unit tomcat.service entered failed state.
      Nov 23 21:49:56 --- systemd: tomcat.service failed.
      Nov 23 21:49:57 --- Keepalived_vrrp[33369]: /etc/keepalived/check_tomcat.sh exited with status 1
      Nov 23 21:49:59 --- Keepalived_vrrp[33369]: /etc/keepalived/check_tomcat.sh exited with status 1
      Nov 23 21:50:01 --- Keepalived_vrrp[33369]: /etc/keepalived/check_tomcat.sh exited with status 1
      Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Script(check_tomcat) failed
      Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Instance(VI_1) Changing effective priority from 150 to 90
      Nov 23 21:50:01 --- Keepalived_healthcheckers[33368]: Error connecting server [192.168.189.128]:8090.
      Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Instance(VI_1) Received advert with higher priority 150, ours 90
      Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Instance(VI_1) Entering BACKUP STATE
      
    • 从到主

      Nov 23 21:50:01 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) forcing a new MASTER election
      Nov 23 21:50:02 --- Keepalived_healthcheckers[31531]: Error connecting server [192.168.189.128]:8090.
      Nov 23 21:50:02 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) Transition to MASTER STATE
      Nov 23 21:50:03 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) Entering MASTER STATE
      Nov 23 21:50:03 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) setting protocol VIPs.
      Nov 23 21:50:03 --- Keepalived_vrrp[31532]: Sending gratuitous ARP on ens33 for 192.168.189.110
      Nov 23 21:50:03 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.189.110
      Nov 23 21:50:03 --- Keepalived_vrrp[31532]: Sending gratuitous ARP on ens33 for 192.168.189.110
      
  • VRRPV2抓取包

    • 安装: yum -y install tcpdump

    • 心跳包抓取

      ~]# tcpdump -i any ip proto 112  -nn
      # 主变从时 prio 90
      21:55:39.167742 IP 192.168.189.128 > 192.168.189.135: VRRPv2, Advertisement, vrid 51, prio 90, authtype simple, intvl 1s, length 20
      21:55:39.169418 IP 192.168.189.135 > 192.168.189.128: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20
      
      # 主节点挂掉时 prio 0
      21:56:37.586511 IP 192.168.189.135 > 192.168.189.128: VRRPv2, Advertisement, vrid 51, prio 0, authtype simple, intvl 1s, length 20
      21:56:38.002746 IP 192.168.189.128 > 192.168.189.135: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20
      

2.2、邮件提醒

  • 用sendEmail,链接 提取码:1234

    # 或者直接用二进制文件也行,链接提供
    yum install -y sendemail perl-Net-SSLeay perl-IO-Socket-SSL
    
  • /etc/keepalived/notify_sendemail.sh

    #!/bin/bash
    # 核心配置(根据实际环境修改!)
    VIP="192.168.189.110"          # 虚拟IP
    INSTANCE="VI_1"                 # VRRP实例名
    TO="user1@example.com,user2@example.com"  # 多收件人(逗号分隔)
    FROM="keepalived_alarm@example.com"       # 发件人邮箱
    SMTP_SERVER="smtp.example.com"  # SMTP服务器(如QQ邮:smtp.qq.com,阿里云:smtp.aliyun.com)
    SMTP_PORT="587"                 # SMTP端口(SSL默认587,非SSL默认25)
    SMTP_USER="keepalived_alarm@example.com"  # 发件人邮箱账号
    SMTP_PASS="your_auth_code"      # 邮箱授权码(不是登录密码!)
    
    # 节点信息
    NODE_NAME=$(hostname)
    NODE_IP=$(hostname -I | awk '{print $1}')
    TIME=$(date '+%F %H:%M:%S')
    
    # 邮件发送函数
    send_alarm() {
        local subject="$1"
        local body="$2"
        # 调用sendEmail发送(支持SSL,兼容各类邮箱)
        sendemail -f "$FROM" -t "$TO" -s "$SMTP_SERVER:$SMTP_PORT" \
        -u "$subject" -m "$body" -o tls=auto \
        -xu "$SMTP_USER" -xp "$SMTP_PASS"
    }
    
    # 状态逻辑(Keepalived自动传递$1=类型, $2=实例名, $3=状态)
    case "$3" in
        MASTER)
            SUBJECT="【Keepalived切换】$NODE_NAME 成为 $3 节点(VIP: $VIP)"
            BODY="===== Keepalived 状态切换通知 =====
    切换时间:$TIME
    节点名称:$NODE_NAME
    节点IP:$NODE_IP
    实例名称:$2
    目标状态:$3
    虚拟IP:$VIP
    服务操作:启动后端8090服务
    ======================================"
            send_alarm "$SUBJECT" "$BODY"
            systemctl start 你的8090服务名  # 如tomcat、自定义jar服务
            ;;
        BACKUP)
            SUBJECT="【Keepalived切换】$NODE_NAME 成为 $3 节点(VIP: $VIP)"
            BODY="===== Keepalived 状态切换通知 =====
    切换时间:$TIME
    节点名称:$NODE_NAME
    节点IP:$NODE_IP
    实例名称:$2
    目标状态:$3
    虚拟IP:$VIP
    服务操作:停止后端8090服务(避免冲突)
    ======================================"
            send_alarm "$SUBJECT" "$BODY"
            systemctl stop 你的8090服务名
            ;;
        FAULT)
            SUBJECT="【Keepalived告警】$NODE_NAME 进入 $3 状态(VIP: $VIP)"
            BODY="===== Keepalived 故障告警通知 =====
    告警时间:$TIME
    节点名称:$NODE_NAME
    节点IP:$NODE_IP
    实例名称:$2
    故障状态:$3
    虚拟IP:$VIP
    紧急处理:请检查节点网络、8090服务或Keepalived配置!
    ======================================"
            send_alarm "$SUBJECT" "$BODY"
            systemctl stop 你的8090服务名
            ;;
        *)
            echo "未知状态:$3"
            exit 1
            ;;
    esac
    
    exit 0
    
  • 脚本&配置

    # 赋予执行权限
    chmod +x /etc/keepalived/notify_sendemail.sh
    
    vrrp_instance VI_1 {
        ...  # 其他原有配置(如track_script、virtual_ipaddress等)
        # 关联sendEmail通知脚本
        notify_master "/etc/keepalived/notify_sendemail.sh"
        notify_backup "/etc/keepalived/notify_sendemail.sh"
        notify_fault "/etc/keepalived/notify_sendemail.sh"
    }
    
  • 手动测试脚本

    /etc/keepalived/notify_sendemail.sh INSTANCE VI_1 MASTER
    

三、脑裂问题

当两台主机互相无法感知对方的存在,那么就会认为主已经挂掉, 此时通过自身的调用机制会将vip等资源都弄过来,这样当两个主机都同时认为自己是主,那么有可能会在某一时刻同时写给数据库会造成死锁的现象,也有可能会产生其它的资源争用,更主要的是两个一样的ip在同一个局域网内它们两都可能上不了网无法提供服务。

  • 脑裂产生的原因

    1. 心跳链路故障:心跳线(直连线 / 网络链路)松动、断裂、老化,或中间交换机端口故障。
    2. 网络配置异常:单播 / 组播配置错误、子网掩码不匹配、路由不可达,组播地址未加入。
    3. 中间设备故障:路由器、交换机、网关故障或配置错误,阻断心跳通信。
    4. 仲裁机制失效:第三方仲裁节点(如 ZooKeeper)故障或通信中断,无法协同决策。
    5. 防火墙限制:未开放 VRRP 协议(IP 协议号 112)、组播地址(224.0.0.18)或节点间通信。
    6. 硬软件异常:心跳网卡损坏、Keepalived 版本 Bug、系统内核参数配置不当。
    7. 配置不一致virtual_router_id、认证信息(auth_type/auth_pass)或优先级配置冲突。
  • 脑裂解决办法

    • 双心跳冗余(低成本首选):并行部署两条独立链路(网线 / 串行电缆),避免单链路故障。
    • 硬件强制隔离:用 STONITH 等设备,检测脑裂后自动重启 / 断电异常节点。
    • 脚本 + 仲裁:通过监控脚本检测双 VIP 冲突,触发告警或自动下线异常节点,必要时人工介入。
  • 邮件提醒

3.1、邮件提醒

notify_master 当切换成主时发送邮件
notify_backup 切换成备时也发送邮件
notify_fault 失败时

notify可以使用3个参数,如下:
$1:可以是GROUP或INTANCE,表明后面是组还是实例。
$2:组名或实例名。
$3:转换后的目标状态。有:MASTER、BACKUP、FAULT。

#!/bin/bash
# 
# description: An example of notify script
# 
vip=172.16.100.1
contact='root@localhost'
notify() {
  mailsubject="`hostname` to be $1: $vip floating"
  mailbody="`date '+%F %H:%M:%S'`: vrrp transition, `hostname` changed to be $1"
  echo $mailbody | mail -s "$mailsubject" $contact
}

case "$1" in
  master)
    notify master
    /etc/rc.d/init.d/nginx start
    exit 0
  ;;
  backup)
    notify backup
    /etc/rc.d/init.d/nginx stop
    exit 0
  ;;
  fault)
    notify fault
    /etc/rc.d/init.d/nginx stop
    exit 0
  ;;
  *)
    echo 'Usage: `basename $0` {master|backup|fault}'
    exit 1
  ;;
esac

# 将这个加到 track_script 下边如
    track_script {
        chk_http_port
    }

  notify_master "/etc/keepalived/notify.sh master"  
  notify_backup "/etc/keepalived/notify.sh backup"  
  notify_fault "/etc/keepalived/notify.sh fault"  
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐