Replace Failed Kubernetes Etcd Member
By
- 3 minutes read - 449 wordsI had a pretty knotty problem in my homelab. I am running a Kubernetes cluster in the with 3 masters and an embeded Etcd cluster. That means that the Etcd cluster runs on the same nodes as the K8s API and scheduler pods. Like them, it is running as Pods controlled directly by Kubelet (magic! except it isn’t). The data on one of those members (node3) got corrupted, so naturally it would no longer join the cluster.
What you need to do is remove that (etcd) node from the cluster and recreate it. This is pretty simple, but needs a bit of under-the-bonnet knowledge. So how is this Pod configurered?
I hinted at a bit of magic earlier. These pods are running in K8s, and visible in the kube-system
namespace, but are not actually manged by the Kubernetes scheduler. They are managed by the Kubelet itself. Kubelet on each master watches /etc/kubernetes/manifests
and will action any valid manifest files you place in that folder. When I installed the cluster with kubeadm
it did the following:
$ ls /etc/kubernetes/manifests/
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
The part which interests me is in the spec.volumes
key of etcd.yaml
:
spec:
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
This tells me 2 things:
- The actual cluster data is store in
/var/lib/etcd
on my physical node - The certificates for cluster comms are in
/etc/kubernetes/pki/etcd
So now I need etcdctl
that I can use which can access both the kube masters and those certificates. I actually had it on another machine in the lab, so I copied the pki/etcd
contents to that machine, but you could put etcdctl
on the broken master, it is just a binary.
You will need the UUID for your failed node:
export ETCDCTL="etcdctl --endpoints=https://<node1>:2379,https://<node2>:2379,https://<node3>:2379 \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt
${ETCDCTL} member list
Remove the failed node from the Etcd cluster:
${ETCDCTL} remove <uuid-of-failed-node>
The simple move the etcd.yaml
to one side:
mv /etc/kubernetes/manifests/etcd.yaml .
The kubelet wil then stop the Etcd pod and you can clean up its corrupted data dir:
rm -rf /var/lib/etcd/member
Re-start the pod:
mv etcd.yaml /etc/kubernetes/manifests/
That will restart the pod, but you still need to add it to the cluster:
${ETCDCTL} member add --peer-urls=https://<node3>:2380 <node3>
It will probably take a couple of restarts before it is properly healthy, but Kubelet will take care of that.
Before long you can run ${ETCDCTL} endpoint health
and all will return good.
Conclusion
Nothing was actually that complex, but I needed to know a couple of things about how K8s does things:
- Where
kubeadm
put the certificates - That Kubelet watches
/etc/kubernetes/manifests
for static Pods (defined bystaticPodPath
).