linux服务-Keepalived配置-2
当两台主机互相无法感知对方的存在,那么就会认为主已经挂掉, 此时通过自身的调用机制会将vip等资源都弄过来,这样当两个主机都同时认为自己是主,那么有可能会在某一时刻同时写给数据库会造成死锁的现象,也有可能会产生其它的资源争用,更主要的是两个一样的ip在同一个局域网内它们两都可能上不了网无法提供服务。用sendEmail,
·
Keepalived配置
参考:接上章kp原理,官网下载,keepalived 配置文件参数详解, mysql 临时, 配置文件详解, 健康检查, lvs+keepalived+nginx, Keepalived实现LVS(DR模式)
一、环境说明
-
机器准备
节点角色 IP地址 操作系统 核心服务 优先级配置 主节点(Master) 192.168.189.128 CentOS 7 Keepalived + tomcat 150(高于备份节点) 备份节点(Backup) 192.168.189.135 CentOS 7 Keepalived + tomcat 100(低于主节点) 虚拟节点(VIP) 192.168.189.110 - 对外提供服务的统一IP - -
yum安装
# 两台节点均执行以下命令安装 yum -y install keepalived libnl3-devel ipset-devel -v # 启动并设为开机自启 systemctl enable --now keepalived # 检查状态 systemctl status keepalived
二、配置示例
-
主节点
! Configuration File for keepalived # 全局配置 global_defs { router_id LVS_MASTER smtp_server 192.168.1.1 notification_email { sysadmin@example.com } } # 健康检查脚本 (已修正名称匹配) vrrp_script check_tomcat { script "/etc/keepalived/check_tomcat.sh" interval 2 # 每隔2秒检测一次 weight -60 # 若失败降低优先级60 fall 3 # 连续3次失败判定宕机 rise 2 # 连续2次成功恢复服务 } # VRRP实例 vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 priority 150 # 主节点优先级高于备份节点 advert_int 1 # 广播间隔设为1秒 unicast_src_ip 192.168.189.135 # 本节点IP地址 unicast_peer { 192.168.189.128 # 备份节点IP地址 } authentication { auth_type PASS auth_pass 11112222 # 推荐更换为复杂密码 } virtual_ipaddress { 192.168.189.110/24 # 浮动VIP地址 } track_script { check_tomcat # 绑定健康检查脚本 } # 可选: 添加状态切换通知脚本 # notify_master /opt/master.sh # notify_backup /opt/backup.sh } # 虚拟服务(负载均衡) virtual_server 192.168.189.110 8090 { delay_loop 6 # RS轮询探测间隔 lb_algo rr # 负载算法:轮询 lb_kind DR # 直接路由模式 protocol TCP real_server 192.168.189.128 8090 { weight 1 HTTP_GET { url { path / } connect_timeout 3 # 设置连接超时时间 nb_get_retry 3 # 最多重试次数 delay_before_retry 3 # 重试前等待时间 } } real_server 192.168.189.135 8090 { weight 1 HTTP_GET { url { path / } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } }-
/etc/keepalived/check_tomcat.sh
#!/bin/bash # 用ss命令检测8090端口是否监听(ss是系统自带,替代nc) ss -tuln | grep -q ":8090 " # -q静默模式,仅返回退出码 if [ $? -ne 0 ]; then # 端口未监听,返回1(触发优先级降低) exit 1 else # 端口正常,返回0 exit 0 fi -
权限设置
chmod 777 /etc/keepalived/check_tomcat.sh chmod 644 /etc/keepalived/keepalived.conf
-
-
备节点
# 调一下地址 state BACKUP # <-- 两边都是backup让它们自己去协商,都是master会直接脑裂 unicast_src_ip 192.168.189.128 # 本节点IP地址 unicast_peer { 192.168.189.135 # 备份节点IP地址 }
2.1、检查
-
启动kp日志
Nov 23 21:47:08 -- systemd: Started LVS and VRRP High Availability Monitor. Nov 23 21:47:08 -- Keepalived[32967]: Starting VRRP child process, pid=32969 Nov 23 21:47:08 -- Keepalived_healthcheckers[32968]: Opening file '/etc/keepalived/keepalived.conf'. Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Registering Kernel netlink reflector Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Registering Kernel netlink command channel Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Registering gratuitous ARP shared channel Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Opening file '/etc/keepalived/keepalived.conf'. Nov 23 21:47:08 -- Keepalived_vrrp[32969]: WARNING - default user 'keepalived_script' for script execution does not exist - please create. Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Unsafe permissions found for script '/etc/keepalived/check_tomcat.sh'. Nov 23 21:47:08 -- Keepalived_vrrp[32969]: SECURITY VIOLATION - scripts are being executed but script_security not enabled. There are insecure scripts. Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) removing protocol VIPs. Nov 23 21:47:08 -- Keepalived_vrrp[32969]: Using LinkWatch kernel netlink reflector... Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Entering BACKUP STATE Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)] Nov 23 21:47:08 -- Keepalived_healthcheckers[32968]: Activating healthchecker for service [192.168.189.110]:8090 Nov 23 21:47:08 -- Keepalived_healthcheckers[32968]: Activating healthchecker for service [192.168.189.110]:8090 Nov 23 21:47:08 -- Keepalived_vrrp[32969]: VRRP_Script(check_tomcat) succeeded Nov 23 21:47:11 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Transition to MASTER STATE Nov 23 21:47:12 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Entering MASTER STATE Nov 23 21:47:12 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) setting protocol VIPs. Nov 23 21:47:12 -- Keepalived_vrrp[32969]: Sending gratuitous ARP on ens33 for 192.168.189.110 Nov 23 21:47:12 -- Keepalived_vrrp[32969]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.189.110 -
节点切换时日志
-
主切到从
Nov 23 21:49:56 --- systemd: Unit tomcat.service entered failed state. Nov 23 21:49:56 --- systemd: tomcat.service failed. Nov 23 21:49:57 --- Keepalived_vrrp[33369]: /etc/keepalived/check_tomcat.sh exited with status 1 Nov 23 21:49:59 --- Keepalived_vrrp[33369]: /etc/keepalived/check_tomcat.sh exited with status 1 Nov 23 21:50:01 --- Keepalived_vrrp[33369]: /etc/keepalived/check_tomcat.sh exited with status 1 Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Script(check_tomcat) failed Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Instance(VI_1) Changing effective priority from 150 to 90 Nov 23 21:50:01 --- Keepalived_healthcheckers[33368]: Error connecting server [192.168.189.128]:8090. Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Instance(VI_1) Received advert with higher priority 150, ours 90 Nov 23 21:50:01 --- Keepalived_vrrp[33369]: VRRP_Instance(VI_1) Entering BACKUP STATE -
从到主
Nov 23 21:50:01 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) forcing a new MASTER election Nov 23 21:50:02 --- Keepalived_healthcheckers[31531]: Error connecting server [192.168.189.128]:8090. Nov 23 21:50:02 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) Transition to MASTER STATE Nov 23 21:50:03 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) Entering MASTER STATE Nov 23 21:50:03 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) setting protocol VIPs. Nov 23 21:50:03 --- Keepalived_vrrp[31532]: Sending gratuitous ARP on ens33 for 192.168.189.110 Nov 23 21:50:03 --- Keepalived_vrrp[31532]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.189.110 Nov 23 21:50:03 --- Keepalived_vrrp[31532]: Sending gratuitous ARP on ens33 for 192.168.189.110
-
-
VRRPV2抓取包
-
安装:
yum -y install tcpdump -
心跳包抓取
~]# tcpdump -i any ip proto 112 -nn # 主变从时 prio 90 21:55:39.167742 IP 192.168.189.128 > 192.168.189.135: VRRPv2, Advertisement, vrid 51, prio 90, authtype simple, intvl 1s, length 20 21:55:39.169418 IP 192.168.189.135 > 192.168.189.128: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 # 主节点挂掉时 prio 0 21:56:37.586511 IP 192.168.189.135 > 192.168.189.128: VRRPv2, Advertisement, vrid 51, prio 0, authtype simple, intvl 1s, length 20 21:56:38.002746 IP 192.168.189.128 > 192.168.189.135: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20
-
2.2、邮件提醒
-
用sendEmail,链接 提取码:1234
# 或者直接用二进制文件也行,链接提供 yum install -y sendemail perl-Net-SSLeay perl-IO-Socket-SSL -
/etc/keepalived/notify_sendemail.sh#!/bin/bash # 核心配置(根据实际环境修改!) VIP="192.168.189.110" # 虚拟IP INSTANCE="VI_1" # VRRP实例名 TO="user1@example.com,user2@example.com" # 多收件人(逗号分隔) FROM="keepalived_alarm@example.com" # 发件人邮箱 SMTP_SERVER="smtp.example.com" # SMTP服务器(如QQ邮:smtp.qq.com,阿里云:smtp.aliyun.com) SMTP_PORT="587" # SMTP端口(SSL默认587,非SSL默认25) SMTP_USER="keepalived_alarm@example.com" # 发件人邮箱账号 SMTP_PASS="your_auth_code" # 邮箱授权码(不是登录密码!) # 节点信息 NODE_NAME=$(hostname) NODE_IP=$(hostname -I | awk '{print $1}') TIME=$(date '+%F %H:%M:%S') # 邮件发送函数 send_alarm() { local subject="$1" local body="$2" # 调用sendEmail发送(支持SSL,兼容各类邮箱) sendemail -f "$FROM" -t "$TO" -s "$SMTP_SERVER:$SMTP_PORT" \ -u "$subject" -m "$body" -o tls=auto \ -xu "$SMTP_USER" -xp "$SMTP_PASS" } # 状态逻辑(Keepalived自动传递$1=类型, $2=实例名, $3=状态) case "$3" in MASTER) SUBJECT="【Keepalived切换】$NODE_NAME 成为 $3 节点(VIP: $VIP)" BODY="===== Keepalived 状态切换通知 ===== 切换时间:$TIME 节点名称:$NODE_NAME 节点IP:$NODE_IP 实例名称:$2 目标状态:$3 虚拟IP:$VIP 服务操作:启动后端8090服务 ======================================" send_alarm "$SUBJECT" "$BODY" systemctl start 你的8090服务名 # 如tomcat、自定义jar服务 ;; BACKUP) SUBJECT="【Keepalived切换】$NODE_NAME 成为 $3 节点(VIP: $VIP)" BODY="===== Keepalived 状态切换通知 ===== 切换时间:$TIME 节点名称:$NODE_NAME 节点IP:$NODE_IP 实例名称:$2 目标状态:$3 虚拟IP:$VIP 服务操作:停止后端8090服务(避免冲突) ======================================" send_alarm "$SUBJECT" "$BODY" systemctl stop 你的8090服务名 ;; FAULT) SUBJECT="【Keepalived告警】$NODE_NAME 进入 $3 状态(VIP: $VIP)" BODY="===== Keepalived 故障告警通知 ===== 告警时间:$TIME 节点名称:$NODE_NAME 节点IP:$NODE_IP 实例名称:$2 故障状态:$3 虚拟IP:$VIP 紧急处理:请检查节点网络、8090服务或Keepalived配置! ======================================" send_alarm "$SUBJECT" "$BODY" systemctl stop 你的8090服务名 ;; *) echo "未知状态:$3" exit 1 ;; esac exit 0 -
脚本&配置
# 赋予执行权限 chmod +x /etc/keepalived/notify_sendemail.sh vrrp_instance VI_1 { ... # 其他原有配置(如track_script、virtual_ipaddress等) # 关联sendEmail通知脚本 notify_master "/etc/keepalived/notify_sendemail.sh" notify_backup "/etc/keepalived/notify_sendemail.sh" notify_fault "/etc/keepalived/notify_sendemail.sh" } -
手动测试脚本
/etc/keepalived/notify_sendemail.sh INSTANCE VI_1 MASTER
三、脑裂问题
当两台主机互相无法感知对方的存在,那么就会认为主已经挂掉, 此时通过自身的调用机制会将vip等资源都弄过来,这样当两个主机都同时认为自己是主,那么有可能会在某一时刻同时写给数据库会造成死锁的现象,也有可能会产生其它的资源争用,更主要的是两个一样的ip在同一个局域网内它们两都可能上不了网无法提供服务。
-
脑裂产生的原因
- 心跳链路故障:心跳线(直连线 / 网络链路)松动、断裂、老化,或中间交换机端口故障。
- 网络配置异常:单播 / 组播配置错误、子网掩码不匹配、路由不可达,组播地址未加入。
- 中间设备故障:路由器、交换机、网关故障或配置错误,阻断心跳通信。
- 仲裁机制失效:第三方仲裁节点(如 ZooKeeper)故障或通信中断,无法协同决策。
- 防火墙限制:未开放 VRRP 协议(IP 协议号 112)、组播地址(224.0.0.18)或节点间通信。
- 硬软件异常:心跳网卡损坏、Keepalived 版本 Bug、系统内核参数配置不当。
- 配置不一致:
virtual_router_id、认证信息(auth_type/auth_pass)或优先级配置冲突。
-
脑裂解决办法
- 双心跳冗余(低成本首选):并行部署两条独立链路(网线 / 串行电缆),避免单链路故障。
- 硬件强制隔离:用 STONITH 等设备,检测脑裂后自动重启 / 断电异常节点。
- 脚本 + 仲裁:通过监控脚本检测双 VIP 冲突,触发告警或自动下线异常节点,必要时人工介入。
-
邮件提醒
3.1、邮件提醒
notify_master 当切换成主时发送邮件
notify_backup 切换成备时也发送邮件
notify_fault 失败时
notify可以使用3个参数,如下:
$1:可以是GROUP或INTANCE,表明后面是组还是实例。
$2:组名或实例名。
$3:转换后的目标状态。有:MASTER、BACKUP、FAULT。
#!/bin/bash
#
# description: An example of notify script
#
vip=172.16.100.1
contact='root@localhost'
notify() {
mailsubject="`hostname` to be $1: $vip floating"
mailbody="`date '+%F %H:%M:%S'`: vrrp transition, `hostname` changed to be $1"
echo $mailbody | mail -s "$mailsubject" $contact
}
case "$1" in
master)
notify master
/etc/rc.d/init.d/nginx start
exit 0
;;
backup)
notify backup
/etc/rc.d/init.d/nginx stop
exit 0
;;
fault)
notify fault
/etc/rc.d/init.d/nginx stop
exit 0
;;
*)
echo 'Usage: `basename $0` {master|backup|fault}'
exit 1
;;
esac
# 将这个加到 track_script 下边如
track_script {
chk_http_port
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
更多推荐



所有评论(0)