2:30 AM and the data pipeline team was staring at a PVC that had been Pending for 58 minutes. The Pod using it was stuck in ContainerCreating with zero events for the last twenty of those minutes. The on-call had already described the pod, described the PVC, described the node, and found nothing useful. "It just sits there yaar." He was right, it just sat there. PVCs do that. They sit there silently because nothing in the cluster is obligated to tell you why a claim has not been satisfied. The provisioner might be missing, the storageClass might be a typo, the topology might be wrong, the capacity might not match. Five different failures, one identical symptom.
The scenario
This is one of my favorite scenarios in the repo because it teaches you to look in the right place instead of guessing.
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/persistent-volume-claim-issues
lsissue.yaml declares a PVC with storageClassName: non-existent-storage-class and a Pod that mounts it. Nothing in the cluster can satisfy that claim.
Reproduce the issue
kubectl apply -f issue.yaml
kubectl get pvc,podNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pvc-issue Pending non-existent-storage-class 40s
NAME READY STATUS RESTARTS AGE
pod/pvc-issue-pod 0/1 ContainerCreating 0 40sTwo objects. Neither of them is going anywhere.
Debug the hard way
kubectl describe pvc pvc-issueStatus: Pending
StorageClass: non-existent-storage-class
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 38s persistentvolume-controller storageclass.storage.k8s.io "non-existent-storage-class" not foundThere it is. One line. The storage class does not exist. The controller cannot dynamically provision anything because it has no provisioner to call.
kubectl get storageclassNAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE AGE
standard (default) rancher.io/local-path Delete WaitForFirstConsumer 6dOnly standard exists. The PVC asked for something that was never registered.
kubectl get pv
# No resources foundNo static PV waiting around either. No provisioner, no static PV, no binding. Ever.
Why this happens
A PVC can fail to bind for five distinct reasons, and the events only tell you one at a time. First, the storageClassName references a class that does not exist, which is what we have here. Second, the class exists but its provisioner is not running, so the dynamic request never gets serviced. Third, the topology constraint cannot be satisfied because the only nodes in the right zone are cordoned or full. Fourth, the requested accessModes do not match anything the underlying storage supports, for example a ReadWriteMany request on a backend that only offers ReadWriteOnce. Fifth, the requested capacity is larger than any available PV and the class is not dynamic, so nothing can grow to fit.
The mental model I use is a three-step funnel. Does the storageClass exist. Can its provisioner respond. Do the constraints match something real. If any of those three fail, the PVC sits in Pending forever. There is no timeout, no failure event beyond the first one, no auto-fallback. Silent infinity.
The trap is that the first event is often the only event. If you miss it, describe will just show you the Pending status with no explanation, and you will start guessing.
The fix
The repo's fix takes a different path. Instead of creating the missing storage class, it creates a matching static PV and a PVC with no storage class at all.
kubectl delete -f issue.yaml
kubectl apply -f fix.yamlKey diff:
kind: PersistentVolume
metadata:
name: valid-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/datakubectl get pvc pvc-fixed
# pvc-fixed Bound valid-pv 1Gi RWO 15sBound. The binder matched the claim against the static PV because capacity, accessModes, and storageClass (empty on both) all lined up.
The lesson
- A Pending PVC always has a first event that explains it. If you missed it, describe the PVC again and look at the top of the events list.
- There are exactly five reasons a PVC does not bind: missing class, dead provisioner, bad topology, wrong accessMode, or capacity mismatch. Walk the funnel.
- Static PVs still work when dynamic provisioning does not. They are ugly, but they unblock you in minutes.
Day 18 of 35. Tomorrow, the volume binds but the app still cannot write to it, and the reason is three decades old.
