从单节点到高可用:我是如何一步步搭建生产级K8s集群的
本文分享了在虚拟机上搭建Kubernetes高可用集群的实践过程,包括集群架构设计和部署脚本。集群采用HAProxy+3 Master+2 Node架构,使用containerd运行时和Calico网络插件。文章提供了三个关键脚本:HAProxy部署脚本(修复了启动问题)、K8s环境清理脚本和节点初始化脚本,帮助用户快速搭建生产可用的K8s集群环境。
·
最近在虚拟机上搭建了一套基于HAProxy + 3 Master + 2 Node的K8s高可用集群,踩了不少坑,也沉淀了一套可复用的部署脚本。今天就把整个过程和脚本分享出来,希望能帮到有同样需求的同学。
一、集群架构设计
我们的目标是搭建一个生产可用的K8s集群,核心设计如下:
| 角色 | 节点IP | 说明 |
|---|---|---|
| HAProxy | 192.168.56.102 | 作为API Server的负载均衡入口,无需健康检查 |
| Master | 192.168.56.111 | 第一个控制平面节点,生成etcd证书和集群配置 |
| Master | 192.168.56.112 | 加入集群的控制平面节点 |
| Master | 192.168.56.113 | 加入集群的控制平面节点 |
| Node | 192.168.56.114 | 工作节点 |
| Node | 192.168.56.115 | 工作节点 |
关键配置:
- Pod网段:
192.156.32.0/20 - Service网段:
192.168.48.0/24 - 所有节点强制使用
enp0s8网卡IP - 使用containerd作为容器运行时
- 使用Calico作为网络插件
二、部署脚本合集
1. 01-deploy-haproxy.sh(修复HAProxy启动问题)
#!/bin/bash
set -euo pipefail
# 配置项
HAPROXY_NODE_IP="192.168.56.102"
MASTER_NODES=("192.168.56.111" "192.168.56.112" "192.168.56.113")
K8S_API_PORT=6443
HAPROXY_PORT=6443
# 安装haproxy
apt update -y && apt install -y haproxy net-tools || true
# 备份原有配置
mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak$(date +%Y%m%d%H%M%S) || true
# 生成haproxy配置
cat > /etc/haproxy/haproxy.cfg << EOF
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 2000
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend k8s-api-frontend
bind ${HAPROXY_NODE_IP}:${HAPROXY_PORT}
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
balance roundrobin
option tcp-check
# 修复:fall 0 rise 0 改为 fall 3 rise 2,保证HAProxy能启动
EOF
# 添加master节点到backend
for master in "${MASTER_NODES[@]}"; do
cat >> /etc/haproxy/haproxy.cfg << EOF
server master-${master//./-} ${master}:${K8S_API_PORT} check inter 2000 fall 3 rise 2
EOF
done
# 重启haproxy并设置开机自启
systemctl daemon-reload
systemctl enable --now haproxy
systemctl restart haproxy
# 验证haproxy状态
if systemctl is-active --quiet haproxy; then
echo -e "\033[32mHAProxy部署成功,监听地址: ${HAPROXY_NODE_IP}:${HAPROXY_PORT}\033[0m"
else
echo -e "\033[31mHAProxy启动失败\033[0m"
systemctl status haproxy --no-pager
exit 1
fi
修复点:将fall 0 rise 0改为fall 3 rise 2,解决HAProxy启动失败问题。
2. 02-clean-k8s-env.sh(清理K8s残留)
#!/bin/bash
set -euo pipefail
echo "开始清理K8s集群残留资源..."
# 停止服务
systemctl stop kubelet containerd haproxy || true
pkill -9 kubelet containerd || true
# 卸载挂载卷
echo "卸载K8s相关挂载卷..."
mount | grep -E "/run/containerd|/var/lib/kubelet|/var/lib/containerd|/opt/cni" | awk '{print $3}' | xargs -r umount -lf || true
# 删除目录和文件
echo "删除K8s相关目录/文件..."
rm -rf /run/containerd /var/lib/containerd /etc/containerd \
/etc/kubernetes /var/lib/kubelet /var/lib/cni /opt/cni/bin \
/run/kubelet/* /root/.kube /etc/systemd/system/kubelet.service* \
/etc/default/kubelet /tmp/k8s-bin || true
# 删除二进制文件
rm -f /usr/bin/kubeadm /usr/bin/kubelet /usr/bin/kubectl /usr/local/bin/crictl || true
# 清理iptables规则
echo "清理iptables规则..."
iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat && iptables -F -t mangle && iptables -X -t mangle || true
ip6tables -F && ip6tables -X && ip6tables -F -t nat && ip6tables -X -t nat && ip6tables -F -t mangle && ip6tables -X -t mangle || true
# 清理网络接口
echo "清理CNI网络接口..."
ip link delete cni0 || true
ip link delete flannel.1 || true
ip link delete cali* || true
ip link delete kube-ipvs0 || true
# 清理swap
swapoff -a || true
sed -i '/swap/s/^/#/' /etc/fstab || true
# 重新加载systemd
systemctl daemon-reload
echo -e "\033[32mK8s环境清理完成\033[0m"
3. 03-init-all-nodes.sh(初始化所有节点,不含HAProxy)
#!/bin/bash
set -euo pipefail
# 核心配置
PROXY_ADDR="http://192.168.56.102:8080"
HARBOR_REGISTRY="192.168.56.102"
HARBOR_FULL_ADDR="http://${HARBOR_REGISTRY}"
HARBOR_USER="admin"
HARBOR_PASS="Harbor12345"
K8S_VERSION="1.33.6"
ARCH="amd64"
DOWNLOAD_DIR="/tmp/k8s-bin"
PAUSE_VERSION="3.8"
PAUSE_TARGET_VERSION="3.10"
SOURCE_REGISTRY="registry.k8s.io"
TARGET_REGISTRY="${HARBOR_REGISTRY}/library"
# 获取当前节点enp0s8的IP
NODE_IP=$(ip addr show enp0s8 | grep -oP '(?<=inet\s)\d+(\.\d+){3}' | head -1)
if [ -z "${NODE_IP}" ]; then
echo -e "\033[31m未找到enp0s8网卡的IP,请检查网卡名称\033[0m"
exit 1
fi
echo "当前节点enp0s8 IP: ${NODE_IP}"
# 1. 基础配置
timedatectl set-timezone Asia/Shanghai || true
apt update -y && apt upgrade -y || true
# 2. 安装依赖
apt install -y curl wget iptables apt-transport-https ca-certificates gnupg2 software-properties-common net-tools socat conntrack ipset || true
# 3. 关闭swap
swapoff -a
sed -i '/swap/s/^/#/' /etc/fstab
# 4. 加载内核模块
cat <<EOF | tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
modprobe overlay || true
modprobe br_netfilter || true
# 5. 设置sysctl参数
cat <<EOF | tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.tcp_tw_recycle = 0
vm.swappiness=0
EOF
sysctl --system || true
# 6. 部署containerd
# 下载containerd
curl -L --proxy ${PROXY_ADDR} --insecure -s https://github.com/containerd/containerd/releases/download/v1.7.30/containerd-1.7.30-linux-${ARCH}.tar.gz -o /tmp/containerd.tar.gz
tar Cxzvf /usr/local /tmp/containerd.tar.gz || true
# 下载runc
curl -L --proxy ${PROXY_ADDR} --insecure -s https://github.com/opencontainers/runc/releases/download/v1.1.12/runc.${ARCH} -o /tmp/runc
install -m 755 /tmp/runc /usr/local/sbin/runc || true
# 下载CNI插件
CNI_VERSION="1.4.0"
curl -L --proxy ${PROXY_ADDR} --insecure -s https://github.com/containernetworking/plugins/releases/download/v${CNI_VERSION}/cni-plugins-linux-${ARCH}-v${CNI_VERSION}.tgz -o /tmp/cni-plugins.tgz
mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin /tmp/cni-plugins.tgz || true
# 配置containerd
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml || true
sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/' /etc/containerd/config.toml
sed -i 's/config_path = ""/config_path = "\/etc\/containerd\/certs.d"/' /etc/containerd/config.toml
sed -i 's/^\(\s*\)tls_verify = true/\1tls_verify = false/' /etc/containerd/config.toml
# 配置Harbor镜像仓库
mkdir -p /etc/containerd/certs.d/${HARBOR_REGISTRY}
cat > /etc/containerd/certs.d/${HARBOR_REGISTRY}/hosts.toml << EOF
server = "${HARBOR_FULL_ADDR}"
[host."${HARBOR_FULL_ADDR}"]
capabilities = ["pull", "resolve", "push"]
skip_verify = true
allow_insecure = true
[host."${HARBOR_FULL_ADDR}".auth]
username = "${HARBOR_USER}"
password = "${HARBOR_PASS}"
EOF
# 配置containerd服务
cat <<EOF | tee /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
After=network.target local-fs.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Type=notify
Restart=always
RestartSec=5
LimitNOFILE=infinity
[Install]
WantedBy=multi-user.target
EOF
# 启动containerd
systemctl daemon-reload
systemctl enable --now containerd || true
# 7. 安装crictl
CRICTL_VERSION="v1.30.0"
curl -L --proxy ${PROXY_ADDR} --insecure -s https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz | tar zxvf - -C /usr/local/bin || true
cat > /etc/crictl.yaml << EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
# 8. 配置pause镜像
crictl pull ${TARGET_REGISTRY}/pause:${PAUSE_TARGET_VERSION} || true
ctr -n k8s.io images tag ${TARGET_REGISTRY}/pause:${PAUSE_TARGET_VERSION} ${SOURCE_REGISTRY}/pause:${PAUSE_VERSION} || true
# 9. 安装K8s组件
mkdir -p ${DOWNLOAD_DIR} && cd ${DOWNLOAD_DIR}
for bin in kubeadm kubelet kubectl; do
curl -L --proxy ${PROXY_ADDR} --insecure -s https://dl.k8s.io/v${K8S_VERSION}/bin/linux/${ARCH}/${bin} -o ${bin}
chmod +x ${bin} && mv ${bin} /usr/bin/ || true
done
# 10. 配置kubelet
cat <<EOF | tee /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
After=network-online.target
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
RestartSec=10
LimitNOFILE=infinity
[Install]
WantedBy=multi-user.target
EOF
mkdir -p /etc/systemd/system/kubelet.service.d
cat <<EOF | tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_KUBEADM_ARGS --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=${NODE_IP}
EOF
cat <<EOF | tee /etc/default/kubelet
KUBELET_EXTRA_ARGS="--pod-infra-container-image=${TARGET_REGISTRY}/pause:${PAUSE_TARGET_VERSION} --node-ip=${NODE_IP}"
EOF
systemctl daemon-reload
systemctl enable kubelet || true
# 清理临时文件
rm -rf ${DOWNLOAD_DIR} /tmp/containerd.tar.gz /tmp/runc /tmp/cni-plugins.tgz || true
echo -e "\033[32m节点${NODE_IP}初始化完成\033[0m"
4. 04-init-first-master.sh(初始化第一个Master节点111)
#!/bin/bash
set -euo pipefail
# 核心配置
HAPROXY_IP="192.168.56.102"
HAPROXY_PORT=6443
K8S_VERSION="1.33.6"
HARBOR_REGISTRY="192.168.56.102"
TARGET_REGISTRY="${HARBOR_REGISTRY}/library"
POD_SUBNET="192.156.32.0/20"
SVC_SUBNET="192.156.48.0/24"
CALICO_VERSION="v3.29.0"
PROXY_ADDR="http://192.168.56.102:8080"
# 获取当前节点enp0s8的IP
NODE_IP=$(ip addr show enp0s8 | grep -oP '(?<=inet\s)\d+(\.\d+){3}' | head -1)
if [ -z "${NODE_IP}" ]; then
echo -e "\033[31m未找到enp0s8网卡的IP,请检查网卡名称\033[0m"
exit 1
fi
# 生成kubeadm配置文件(含etcd证书配置、指定enp0s8 IP)
cat > /root/kubeadm-config.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v${K8S_VERSION}
imageRepository: ${TARGET_REGISTRY}
controlPlaneEndpoint: "${HAPROXY_IP}:${HAPROXY_PORT}"
networking:
podSubnet: ${POD_SUBNET}
serviceSubnet: ${SVC_SUBNET}
dnsDomain: cluster.local
etcd:
local:
serverCertSANs:
- "${NODE_IP}"
- "${HAPROXY_IP}"
- "127.0.0.1"
peerCertSANs:
- "${NODE_IP}"
- "127.0.0.1"
dataDir: /var/lib/etcd
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
podSandboxImage: "${TARGET_REGISTRY}/pause:3.10"
staticPodPath: /etc/kubernetes/manifests
clusterDomain: cluster.local
clusterDNS:
- ${SVC_SUBNET%.*.*}.10
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: ${NODE_IP}
bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
node-ip: ${NODE_IP}
EOF
# 初始化集群(生成etcd证书)
echo "开始初始化第一个Master节点(${NODE_IP})..."
kubeadm init --config /root/kubeadm-config.yaml --ignore-preflight-errors all --upload-certs -v=5
# 配置kubectl
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# 部署Calico网络
curl -L --proxy ${PROXY_ADDR} --insecure -s https://raw.githubusercontent.com/projectcalico/calico/${CALICO_VERSION}/manifests/calico.yaml -o /tmp/calico.yaml
# 修改Calico网段为自定义Pod网段
sed -i "s|192.168.0.0/16|${POD_SUBNET}|g" /tmp/calico.yaml
# 修改镜像仓库
sed -i "s|docker.io/calico|${TARGET_REGISTRY}/calico|g" /tmp/calico.yaml
kubectl apply -f /tmp/calico.yaml
# 移除master节点的taint(可选,让master也能运行pod)
kubectl taint nodes --all node-role.kubernetes.io/control-plane- || true
# 保存join命令(用于其他master/node节点)
kubeadm token create --print-join-command > /root/k8s-join-command.sh
chmod +x /root/k8s-join-command.sh
# 保存certificate-key(用于master节点加入)
CERT_KEY=$(kubeadm init phase upload-certs --upload-certs | tail -1)
echo "export CERT_KEY=${CERT_KEY}" > /root/k8s-cert-key.sh
chmod +x /root/k8s-cert-key.sh
echo -e "\033[32m第一个Master节点(${NODE_IP})初始化完成!\033[0m"
echo "join命令已保存至: /root/k8s-join-command.sh"
echo "certificate-key已保存至: /root/k8s-cert-key.sh"
5. 05-join-master-nodes.sh(112/113加入集群,修复依赖+自动拷贝文件)
#!/bin/bash
set -euo pipefail
# 核心配置
HAPROXY_IP="192.168.56.102"
HAPROXY_PORT=6443
K8S_VERSION="1.33.6"
HARBOR_REGISTRY="192.168.56.102"
TARGET_REGISTRY="${HARBOR_REGISTRY}/library"
FIRST_MASTER_IP="192.168.56.111" # 第一个master节点IP
FIRST_MASTER_USER="root" # 第一个master节点登录用户
# 获取当前节点enp0s8的IP
NODE_IP=$(ip addr show enp0s8 | grep -oP '(?<=inet\s)\d+(\.\d+){3}' | head -1)
if [ -z "${NODE_IP}" ]; then
echo -e "\033[31m未找到enp0s8网卡的IP,请检查网卡名称\033[0m"
exit 1
fi
echo "当前节点enp0s8 IP: ${NODE_IP}"
# 新增:自动从第一个master节点拷贝关键文件
echo "开始从${FIRST_MASTER_IP}拷贝k8s-join-command.sh和k8s-cert-key.sh..."
# 尝试免密拷贝,若失败则提示输入密码
if ! scp -o StrictHostKeyChecking=no ${FIRST_MASTER_USER}@${FIRST_MASTER_IP}:/root/k8s-join-command.sh /root/; then
echo -e "\033[33m免密拷贝失败,请输入${FIRST_MASTER_IP}的${FIRST_MASTER_USER}密码:\033[0m"
scp ${FIRST_MASTER_USER}@${FIRST_MASTER_IP}:/root/k8s-join-command.sh /root/ || {
echo -e "\033[31m拷贝k8s-join-command.sh失败,请手动拷贝后重试!\033[0m"
exit 1
}
fi
if ! scp -o StrictHostKeyChecking=no ${FIRST_MASTER_USER}@${FIRST_MASTER_IP}:/root/k8s-cert-key.sh /root/; then
echo -e "\033[33m免密拷贝失败,请输入${FIRST_MASTER_IP}的${FIRST_MASTER_USER}密码:\033[0m"
scp ${FIRST_MASTER_USER}@${FIRST_MASTER_IP}:/root/k8s-cert-key.sh /root/ || {
echo -e "\033[31m拷贝k8s-cert-key.sh失败,请手动拷贝后重试!\033[0m"
exit 1
}
fi
# 检查文件是否存在
if [ ! -f /root/k8s-join-command.sh ]; then
echo -e "\033[31m未找到k8s-join-command.sh文件!\033[0m"
exit 1
fi
if [ ! -f /root/k8s-cert-key.sh ]; then
echo -e "\033[31m未找到k8s-cert-key.sh文件!\033[0m"
exit 1
fi
# 加载certificate-key
source /root/k8s-cert-key.sh
if [ -z "${CERT_KEY}" ]; then
echo -e "\033[31m未提取到CERT_KEY,请检查k8s-cert-key.sh文件内容!\033[0m"
cat /root/k8s-cert-key.sh
exit 1
fi
# 从join命令文件提取token和caCertHash(修复依赖解析)
JOIN_CMD=$(cat /root/k8s-join-command.sh)
TOKEN=$(echo "${JOIN_CMD}" | grep -oP 'token \K\S+')
CA_CERT_HASH=$(echo "${JOIN_CMD}" | grep -oP 'discovery-token-ca-cert-hash \K\S+')
if [ -z "${TOKEN}" ] || [ -z "${CA_CERT_HASH}" ]; then
echo -e "\033[31m无法从join命令中提取token或caCertHash!\033[0m"
echo "join命令内容:${JOIN_CMD}"
exit 1
fi
# 生成kubeadm join配置文件
cat > /root/kubeadm-join-master.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
discovery:
bootstrapToken:
apiServerEndpoint: "${HAPROXY_IP}:${HAPROXY_PORT}"
token: "${TOKEN}"
caCertHashes:
- "${CA_CERT_HASH}"
timeout: 5m0s
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
node-ip: ${NODE_IP}
controlPlane:
certificateKey: "${CERT_KEY}"
localAPIEndpoint:
advertiseAddress: ${NODE_IP}
bindPort: 6443
EOF
# 加入集群
echo "开始将节点${NODE_IP}加入集群(Master角色)..."
kubeadm join --config /root/kubeadm-join-master.yaml --ignore-preflight-errors all -v=5
# 配置kubectl(可选)
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config || true
chown $(id -u):$(id -g) $HOME/.kube/config || true
# 清理临时文件
rm -f /root/kubeadm-join-master.yaml
echo -e "\033[32mMaster节点${NODE_IP}加入集群完成\033[0m"
6. 06-join-node-nodes.sh(114/115加入集群,修复+自动拷贝文件)
#!/bin/bash
set -euo pipefail
# 核心配置
HAPROXY_IP="192.168.56.102"
HAPROXY_PORT=6443
FIRST_MASTER_IP="192.168.56.111" # 第一个master节点IP
FIRST_MASTER_USER="root" # 第一个master节点登录用户
# 1. 获取当前节点enp0s8的IP(鲁棒性检查)
NODE_IP=$(ip addr show enp0s8 | grep -oP '(?<=inet\s)\d+(\.\d+){3}' | head -1)
if [ -z "${NODE_IP}" ]; then
echo -e "\033[31m错误:未找到enp0s8网卡的IP,请检查网卡名称!\033[0m"
exit 1
fi
echo "当前节点enp0s8 IP: ${NODE_IP}"
# 新增:自动从第一个master节点拷贝join命令文件
echo "开始从${FIRST_MASTER_IP}拷贝k8s-join-command.sh..."
# 尝试免密拷贝,若失败则提示输入密码
if ! scp -o StrictHostKeyChecking=no ${FIRST_MASTER_USER}@${FIRST_MASTER_IP}:/root/k8s-join-command.sh /root/; then
echo -e "\033[33m免密拷贝失败,请输入${FIRST_MASTER_IP}的${FIRST_MASTER_USER}密码:\033[0m"
scp ${FIRST_MASTER_USER}@${FIRST_MASTER_IP}:/root/k8s-join-command.sh /root/ || {
echo -e "\033[31m拷贝k8s-join-command.sh失败,请手动拷贝后重试!\033[0m"
exit 1
}
fi
# 2. 检查join命令文件是否存在
JOIN_CMD_FILE="/root/k8s-join-command.sh"
if [ ! -f "${JOIN_CMD_FILE}" ]; then
echo -e "\033[31m错误:未找到join命令文件!\033[0m"
exit 1
fi
# 3. 从join命令文件中提取token和caCertHash(修复解析逻辑)
JOIN_CMD=$(cat "${JOIN_CMD_FILE}")
TOKEN=$(echo "${JOIN_CMD}" | grep -oP 'token \K\S+')
CA_CERT_HASH=$(echo "${JOIN_CMD}" | grep -oP 'discovery-token-ca-cert-hash \K\S+')
# 4. 验证提取结果(鲁棒性检查)
if [ -z "${TOKEN}" ] || [ -z "${CA_CERT_HASH}" ]; then
echo -e "\033[31m错误:无法从join命令中提取token或caCertHash!\033[0m"
echo "当前join命令内容:${JOIN_CMD}"
exit 1
fi
# 5. 生成kubeadm join配置文件(正确配置node-ip)
cat > /root/kubeadm-join-node.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
discovery:
bootstrapToken:
apiServerEndpoint: "${HAPROXY_IP}:${HAPROXY_PORT}"
token: "${TOKEN}"
caCertHashes:
- "${CA_CERT_HASH}"
timeout: 5m0s
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
# 核心修复:通过kubeletExtraArgs指定node-ip(而非命令行参数)
kubeletExtraArgs:
node-ip: ${NODE_IP}
EOF
# 6. 执行join命令(使用配置文件,避免命令行参数错误)
echo "开始将节点${NODE_IP}加入集群(Node角色)..."
kubeadm join --config /root/kubeadm-join-node.yaml --ignore-preflight-errors all -v=5
# 7. 清理临时配置文件
rm -f /root/kubeadm-join-node.yaml
echo -e "\033[32mNode节点${NODE_IP}加入集群完成!\033[0m"
三、部署步骤
- 部署HAProxy:在192.168.56.102执行
01-deploy-haproxy.sh; - 清理环境:在111/112/113/114/115执行
02-clean-k8s-env.sh(保证环境干净); - 初始化所有节点:在111/112/113/114/115执行
03-init-all-nodes.sh(安装依赖、containerd、k8s组件); - 初始化第一个Master:在111执行
04-init-first-master.sh(生成etcd证书、初始化集群、部署Calico); - 加入其他Master:在112/113执行
05-join-master-nodes.sh(自动拷贝文件,无需手动操作); - 加入Node节点:在114/115执行
06-join-node-nodes.sh(自动拷贝文件,修复node-ip参数)。
四、关键修复点总结
- HAProxy启动失败:将
fall 0 rise 0改为fall 3 rise 2,解决健康检查阈值为0导致的启动失败问题。 - Node节点join错误:通过配置文件
JoinConfiguration的nodeRegistration.kubeletExtraArgs指定node-ip,而非直接拼接到命令行,解决unknown flag: --node-ip错误。 - Master节点join依赖:修复了对
k8s-join-command.sh的依赖解析逻辑,自动提取token和caCertHash。 - 自动拷贝文件:在两个join脚本中新增了从第一个Master节点自动拷贝关键文件的逻辑,支持免密登录和密码登录两种方式,减少了手动操作。
五、验证集群
部署完成后,在第一个Master节点执行以下命令验证集群状态:
# 查看所有节点状态
kubectl get nodes
# 查看集群组件状态
kubectl get cs
# 查看Calico Pod状态
kubectl get pods -n kube-system
如果所有节点状态为Ready,且所有Pod都正常运行,说明集群部署成功。
六、总结
通过这套脚本,我们成功搭建了一个高可用的K8s集群。在部署过程中,我们遇到了一些常见的坑,比如HAProxy健康检查配置、kubeadm join参数错误等,通过不断调试和优化,最终得到了一套鲁棒性强、可复用的部署方案。
希望这篇文章能帮助到正在搭建K8s集群的同学,如果你有任何问题或建议,欢迎在评论区留言交流。
更多推荐



所有评论(0)