k8s部署metrics-server是 Kubernetes 实现资源监控(如kubectl top、HPA 自动扩缩容)的核心组件,在部署过程中遇到过以下问题
- 镜像拉取失败(
k8s.gcr.io镜像国内无法访问); - 证书验证问题(需跳过 TLS 验证或配置正确证书);
- API Server 连接问题(需指定
kubelet-insecure-tls)。
部署步骤如下
1.步骤 1:下载官方部署文件(并修改)
# 下载官方 yaml(也可手动创建) wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server.yaml2.步骤 2:修改metrics-server.yaml关键配置
打开metrics-server.yaml,做以下 3 处核心修改:
# 原镜像(国内无法访问) # image: k8s.gcr.io/metrics-server/metrics-server:v0.7.0 # 替换为阿里云镜像(适配 v0.7.0 版本) image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.7.03. 添加启动参数(解决证书 / 连接问题)
在Deployment的args部分,新增以下参数(关键!):
spec: template: spec: containers: - name: metrics-server args: - --cert-dir=/tmp - --secure-port=4443 # 新增以下 3 个参数 - --kubelet-insecure-tls # 跳过 kubelet TLS 验证(测试环境推荐,生产建议配置证书) - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname # 指定 kubelet 地址类型 - --metric-resolution=15s # 监控数据采集间隔4.可选:调整资源限制(根据集群规模)
resources: requests: cpu: 100m memory: 100Mi limits: cpu: 500m memory: 512Mi5.部署metrics-server
kubectl apply -f metrics-server.yaml6.验证部署
kubectl get pods -n kube-system -l k8s-app=metrics-server # 正常输出(STATUS 为 Running): # NAME READY STATUS RESTARTS AGE # metrics-server-7f987d68c4-9x8zl 1/1 Running 0 5m检查 Pod 日志(排查启动失败)
kubectl logs -n kube-system $(kubectl get pods -n kube-system -l k8s-app=metrics-server -o name) # 常见日志错误及解决: # - "x509: certificate signed by unknown authority" → 确认已加 --kubelet-insecure-tls # - "unable to reach kubelet" → 检查 --kubelet-preferred-address-types 参数 # - "image pull failed" → 确认镜像地址正确验证 API 可用性(核心!)
metrics-server会注册metrics.k8s.ioAPI,检查是否正常:
# 查看节点资源使用 kubectl top nodes # 输出示例: # NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% # k8s-master 123m 6% 1200Mi 30% # k8s-node1 89m 4% 980Mi 25% # 查看 Pod 资源使用 kubectl top pods -n kube-system # 输出包含 metrics-server 自身的资源占用二.本次部署环境使用修改后的yaml文件如下,可直接使用
apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server rbac.authorization.k8s.io/aggregate-to-admin: "true" rbac.authorization.k8s.io/aggregate-to-edit: "true" rbac.authorization.k8s.io/aggregate-to-view: "true" name: system:aggregated-metrics-reader rules: - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server name: system:metrics-server rules: - apiGroups: - "" resources: - pods - nodes - nodes/stats - namespaces - configmaps verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server-auth-reader namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server:system:auth-delegator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: system:metrics-server roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: v1 kind: Service metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: ports: - name: https port: 443 protocol: TCP targetPort: 8443 selector: k8s-app: metrics-server --- apiVersion: apps/v1 kind: Deployment metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: k8s-app: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=8443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls - --authorization-always-allow-paths=/livez,/readyz image: swr.cn-east-2.myhuaweicloud.com/kuboard-dependency/metrics-server:v0.5.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 name: metrics-server ports: - containerPort: 8443 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 resources: requests: cpu: 100m memory: 200Mi securityContext: readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 volumeMounts: - mountPath: /tmp name: tmp-dir nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir --- apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: labels: k8s-app: metrics-server name: v1beta1.metrics.k8s.io spec: group: metrics.k8s.io groupPriorityMinimum: 100 insecureSkipTLSVerify: true service: name: metrics-server namespace: kube-system version: v1beta1 versionPriority: 100