若依微服务 Kubernetes 部署笔记(含 Node1 故障修复版)
1. 环境信息
OS: Ubuntu 22.04 LTS
K8s: v1.28.2
容器运行时: containerd v2.2.1(已配置 systemd cgroup + 阿里云 pause 镜像)
网络插件: Flannel
节点列表
名称 | 主机名 | IP |
Master | k8s-master | 192.168.216.129 |
Node1 | ubuntu-virtual-machine | 192.168.216.128(问题节点) |
Node2 | k8snode2-virtual-machine | 192.168.216.130 |
2. 所有节点基础配置
# 关闭 swap swapoff -a && sed -i '/swap/d' /etc/fstab # 加载内核模块 modprobe br_netfilter cat > /etc/modules-load.d/k8s.conf << EOF br_netfilter EOF # sysctl 配置 cat > /etc/sysctl.d/k8s.conf << EOF net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF sysctl --system # 安装 containerd 并配置 apt install -y containerd mkdir -p /etc/containerd containerd config default > /etc/containerd/config.toml sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml sed -i 's|sandbox_image = ".*"|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"|' /etc/containerd/config.toml systemctl restart containerd # 安装 kubeadm/kubelet/kubectl curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - cat > /etc/apt/sources.list.d/kubernetes.list << EOF deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF apt update apt install -y kubelet=1.28.2-00 kubectl=1.28.2-00 kubeadm=1.28.2-00 apt-mark hold kubelet kubeadm kubectl
3. Master 节点初始化
kubeadm init \ --kubernetes-version=v1.28.2 \ --pod-network-cidr=10.244.0.0/16 \ --image-repository=registry.aliyuncs.com/google_containers \ --ignore-preflight-errors=Swap mkdir -p $HOME/.kube cp /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config # 安装 Flannel kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
4. Node 节点加入集群(含 Node1 故障终极修复)
4.1 Node2 正常加入
在 Master 生成 join 命令:
kubeadm token create --print-join-command
在 Node2 上执行该命令。
4.2 Node1 (ubuntu-virtual-machine) 故障修复与加入
问题原因: Node1 的 runc 版本不一致、CNI 配置缺失、kubelet drop-in 文件错误导致无法完成 TLS 引导。
步骤 1:彻底清理旧环境
systemctl stop kubelet containerd kubeadm reset -f iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X rm -f /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/pki/ca.crt rm -f /var/lib/kubelet/config.yaml /var/lib/kubelet/kubeadm-flags.env rm -f /etc/cni/net.d/*.conflist rm -f /etc/containerd/config.toml
步骤 2:替换 runc(从 Node2 拷贝)
scp root@192.168.216.130:/usr/bin/runc /root/runc rm -f /usr/bin/runc /usr/sbin/runc cp /root/runc /usr/bin/runc && cp /root/runc /usr/sbin/runc chmod +x /usr/bin/runc /usr/sbin/runc
步骤 3:重建 containerd 配置
containerd config default > /etc/containerd/config.toml sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml sed -i 's|sandbox_image = ".*"|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"|' /etc/containerd/config.toml systemctl start containerd
步骤 4:创建标准 kubelet 配置 & CNI & systemd drop-in
mkdir -p /etc/cni/net.d /var/lib/kubelet # CNI 配置 cat > /etc/cni/net.d/10-flannel.conflist << 'EOF' { "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ {"type": "flannel", "delegate": {"hairpinMode": true, "isDefaultGateway": true}}, {"type": "portmap", "capabilities": {"portMappings": true}} ] } EOF # kubelet config cat > /var/lib/kubelet/config.yaml << 'EOF' apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration cgroupDriver: systemd staticPodPath: /etc/kubernetes/manifests clusterDomain: cluster.local authentication: webhook: enabled: true authorization: mode: Webhook EOF # kubeadm flags cat > /var/lib/kubelet/kubeadm-flags.env << 'EOF' KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9" EOF # systemd drop-in(关键:包含了 --bootstrap-kubeconfig) cat > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf << 'EOF' [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" ExecStart= ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_CONFIG_ARGS EOF systemctl daemon-reload systemctl restart containerd
步骤 5:执行 kubeadm join
kubeadm join 192.168.216.129:6443 \ --token <your-token> \ --discovery-token-ca-cert-hash sha256:<your-hash> \ --ignore-preflight-errors=all
若卡在 TLS 引导超时,可直接 Ctrl+C,并执行:
systemctl restart kubelet
验证 kubelet 为 active (running) 状态。
步骤 6:批准 CSR(在 Master 上)
kubectl get csr | grep Pending | awk '{print $1}' | xargs -r kubectl certificate approve kubectl get nodes
Node1 变为 Ready。
5. 部署若依微服务(在 Master 上操作)
5.1 部署基础中间件(MySQL、Redis、Nacos)
根据 YAML 创建好 ruoyi 命名空间,并部署 MySQL、Redis、Nacos。确保 ruoyi-mysql、ruoyi-redis、ruoyi-nacos 三个 Pod Running。
5.2 导入若依 SQL 并替换配置
# 导入 SQL MYSQL_POD=$(kubectl get pod -n ruoyi -l app=mysql -o jsonpath='{.items[0].metadata.name}') kubectl exec -i $MYSQL_POD -n ruoyi -- mysql -uroot -p123456 -e "CREATE DATABASE IF NOT EXISTS \`ry-cloud\` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;" kubectl exec -i $MYSQL_POD -n ruoyi -- mysql -uroot -p123456 ry-cloud < ry_20260417.sql kubectl exec -i $MYSQL_POD -n ruoyi -- mysql -uroot -p123456 ry-cloud < quartz.sql kubectl exec -i $MYSQL_POD -n ruoyi -- mysql -uroot -p123456 < ry_config_20260311.sql kubectl exec -i $MYSQL_POD -n ruoyi -- mysql -uroot -p123456 -e "USE \`ry-config\`; UPDATE config_info SET content = REPLACE(content, 'localhost', 'ruoyi-redis'); ..."
5.3 修复微服务 Nacos 地址(最关键的一步)
kubectl set env deployment/ruoyi-gateway -n ruoyi \ SPRING_CLOUD_NACOS_DISCOVERY_SERVER_ADDR=ruoyi-nacos:8848 \ SPRING_CLOUD_NACOS_CONFIG_SERVER_ADDR=ruoyi-nacos:8848 kubectl set env deployment/ruoyi-auth -n ruoyi \ SPRING_CLOUD_NACOS_DISCOVERY_SERVER_ADDR=ruoyi-nacos:8848 \ SPRING_CLOUD_NACOS_CONFIG_SERVER_ADDR=ruoyi-nacos:8848 kubectl set env deployment/ruoyi-system -n ruoyi \ SPRING_CLOUD_NACOS_DISCOVERY_SERVER_ADDR=ruoyi-nacos:8848 \ SPRING_CLOUD_NACOS_CONFIG_SERVER_ADDR=ruoyi-nacos:8848
5.4 修复前端 Nginx 反代(解决验证码/404)
UI_POD=$(kubectl get pod -n ruoyi -l app=ruoyi-ui -o jsonpath='{.items[0].metadata.name}') kubectl exec -i $UI_POD -n ruoyi -- tee /etc/nginx/nginx.conf <<'NEOF' worker_processes 1; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; server { listen 80; location / { root /usr/share/nginx/html; try_files $uri $uri/ /index.html; } location /prod-api/ { proxy_pass http://ruoyi-gateway:8080/; proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } } NEOF kubectl exec $UI_POD -n ruoyi -- nginx -s reload
6. 验证
集群状态:
kubectl get nodes三个节点均为 ReadyPod 状态:
kubectl get pods -n ruoyi全部 RunningNacos 服务列表: 访问 http://192.168.216.129:30848/nacos,可见 ruoyi-gateway、ruoyi-auth、ruoyi-system 注册实例
前端登录: http://192.168.216.129:30000,验证码显示,admin/admin123 登录成功
7. 关键教训
教训 | 说明 |
runc 版本必须统一 | Worker 节点不要用 apt 安装的 runc,从正常节点拷贝二进制文件 |
kubelet drop-in 必须包含 --bootstrap-kubeconfig | 否则 join 阶段会无限重启 |
CNI 配置文件必须预先存在 | /etc/cni/net.d/10-flannel.conflist 必须存在,Flannel 才能正常工作 |
Nacos 地址必须覆盖 | 微服务镜像内是 localhost,必须通过环境变量覆盖为 K8s Service 名称 |
cgroup driver 必须一致 | containerd 和 kubelet 统一为 systemd,否则 Pod 创建失败 |