问题描述
-
系统版本:centos7.9
-
docker版本:19.03.0
-
k8s版本:v1.19.0
出现背景
安装k8s后重启,使用k8s创建默认存储NFS时出现如下错误:
The connection to the server 192.168.174.100:6443 was refused - did you specify the right host or port?
表示 kubectl
无法连接到 Kubernetes API Server。这通常不是因为你输入了错误的 IP 或端口,而是 API Server 没有正常运行、网络不通、或者证书配置有问题。
问题排查
检查网络是否正常
首先是否能够连接主机
[root@master ~]# ping 192.168.174.100 -c 4
PING 192.168.174.100 (192.168.174.100) 56(84) bytes of data.
64 bytes from 192.168.174.100: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 192.168.174.100: icmp_seq=2 ttl=64 time=0.028 ms
64 bytes from 192.168.174.100: icmp_seq=3 ttl=64 time=0.028 ms
64 bytes from 192.168.174.100: icmp_seq=4 ttl=64 time=0.031 ms
--- 192.168.174.100 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.022/0.027/0.031/0.005 ms
主机网络正常
检查端口是否开放
[root@master ~]# telnet 192.168.174.100 6443
Trying 192.168.174.100...
telnet: connect to address 192.168.174.100: Connection refused
端口拒绝访问,那么检查一下防火墙放行6443或者关闭防火墙
防火墙是否放行端口或关闭防火墙
# 这里为了方便直接关闭后再测试
systemctl stop firewalld
# 再测试是否能够连通端口
[root@master ~]# telnet 192.168.174.100 6443
Trying 192.168.174.100...
telnet: connect to address 192.168.174.100: Connection refused
依旧拒绝访问,那么查看网络的配置,这里使用的是calico。查看一下calico相关的文件
查看网络插件
[root@master ~]# ls -al /opt/cni/bin/
total 172532
drwxr-xr-x 2 root root 311 Jul 19 11:24 .
drwxr-xr-x 3 root root 17 Jul 17 17:33 ..
-rwxr-xr-x 1 root root 38985728 Jul 19 11:24 calico
-rwxr-xr-x 1 root root 38985728 Jul 19 11:24 calico-ipam
....
# 如果没有calico或者如果使用的是flannel而这里没有,那么得根据选择安装网络插件
有calico相关的文件,查看calico的配置文件
[root@master ~]# cat /etc/cni/net.d/10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni/cni.log",
"datastore_type": "kubernetes",
"nodename": "master",
"mtu": 0,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}[root@master ~]#
里面的配置项没有什么问题,那么接下来就排查一下服务是否有问题
检查服务是否正常
查看k8s的运行状态,是否处于运行中
systemctl status kubelet
查看结果如下
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Sat 2025-07-19 12:12:48 CST; 8s ago
Docs: https://kubernetes.io/docs/
Process: 2302 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 2302 (code=exited, status=255)
Jul 19 12:12:48 master systemd[1]: Unit kubelet.service entered failed state.
Jul 19 12:12:48 master systemd[1]: kubelet.service failed.
这表明kubectl 服务启动失败
通过查看日志来查找失败的原因
journalctl -u kubelet -f >err.log
等待1分钟左右,查看日志
vim err.log
# 查找命令,查找错误信息
/error
/exception
/fail
----
tatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:22 OomKillDisable:true NGoroutines:35 SystemTime:2025-07-19T12:14:51.447804307+08:00 LoggingDriver:json-file CgroupDriver:systemd NEventsListener:0 KernelVersion:3.10.0-1160.119.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc0002a60e0 NCPU:4 MemTotal:3953737728 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:master Labels:[] ExperimentalBuild:false ServerVersion:19.03.0 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:d2d58213f83a351ca8f528a95fbd145f5654e957 Expected:d2d58213f83a351ca8f528a95fbd145f5654e957} RuncCommit:{ID:v1.1.12-0-g51d5e94 Expected:v1.1.12-0-g51d5e94} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default] ProductLicense: Warnings:[]}
Jul 19 12:14:51 master kubelet[3836]: F0719 12:14:51.455146 3836 server.go:265] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroup" is different from docker cgroup driver: "systemd"
Jul 19 12:14:51 master kubelet[3836]: goroutine 1 [running]:
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc00012a001, 0xc00010d600, 0xa8, 0xfa)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x6cf6140, 0xc000000003, 0x0, 0x0, 0xc0002a6150, 0x6b4854c, 0x9, 0x109, 0x0)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:945 +0x191
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).printDepth(0x6cf6140, 0xc000000003, 0x0, 0x0, 0x1, 0xc000d3fc80, 0x1, 0x1)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:718 +0x165
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).print(...)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:703
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.Fatal(...)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1436
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/cmd/kubelet/app.NewKubeletCommand.func1(0xc0008d82c0, 0xc00012c130, 0x5, 0x5)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:265 +0x63e
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc0008d82c0, 0xc00012c130, 0x5, 0x5, 0xc0008d82c0, 0xc00012c130)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0008d82c0, 0x18538be400a22e42, 0x6cf5c60, 0x409b05)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:950 +0x375
Jul 19 12:14:51 master kubelet[3836]: k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
Jul 19 12:14:51 master kubelet[3836]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:887
Jul 19 12:14:51 master kubelet[3836]: main.main()
Jul 19 12:14:51 master kubelet[3836]: _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubelet/kubelet.go:41 +0xe5
...
Jul 19 12:14:51 master kubelet[3836]: /usr/local/go/src/os/exec/exec.go:311 +0x65
Jul 19 12:14:51 master kubelet[3836]: os/exec.(*Cmd).Start.func1(0xc000bfec60, 0xc00000ec00)
Jul 19 12:14:51 master kubelet[3836]: /usr/local/go/src/os/exec/exec.go:441 +0x27
Jul 19 12:14:51 master kubelet[3836]: created by os/exec.(*Cmd).Start
Jul 19 12:14:51 master kubelet[3836]: /usr/local/go/src/os/exec/exec.go:440 +0x629
Jul 19 12:14:51 master systemd[1]: Unit kubelet.service entered failed state.
Jul 19 12:14:51 master systemd[1]: kubelet.service failed.
Jul 19 12:15:01 master systemd[1]: kubelet.service holdoff time over, scheduling restart.
Jul 19 12:15:01 master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jul 19 12:15:01 master systemd[1]: Started kubelet: The Kubernetes Node Agent.
发现问题
kubelet cgroup driver: "cgroup" is different from docker cgroup driver: "systemd"
由于docker配置的是systemd,而kubelet配置的是cgroup,因此不在同一个驱动
解决方案
解决方案一:修改docker的配置文件
如果这里是systemd,就改为cgroup
vim /etc/docker/daemon.json
---
{
"registry-mirrors": ["https://registry.cyou/",
"https://docker.1panel.live/",
"https://docker.xuanyuan.me",
"https://docker.1ms.run",
"https://82m9ar63.mirror.aliyuncs.com",
"https://wdlqb43d.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=cgroup"]
解决方案二:修改kubelet的配置文件
如果docker是systemd,那这里就改为systemd
vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: cgroup
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
这里采用的第二个方案
服务验证
重启服务
sudo systemctl daemon-reload
sudo systemctl restart kubelet
sudo systemctl enable kubelet
检查是否正常
# 查看kubelet状态
systemctl status kubelet
# 查看节点状态
kubectl get nodes
最后重新应用测试
[root@master ~]# kubectl apply -f storageClass.yaml
storageclass.storage.k8s.io/nfs-storage created
deployment.apps/nfs-client-provisioner created
serviceaccount/nfs-client-provisioner created
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
[root@master ~]
如果其他节点的kubelet配置文件与docker的配置文件的驱动不一致,同样更新配置文件和重启服务
最后注意事项:
如果发现有pod一直处于pending状态,进入查看描述信息发现有污点节点,可以试着删除改污点节点,并重新生成token并重新加入进来