koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 20 / 35

Kubernetes Disk I/O Errors: Pod Symptoms, Node Root Cause

The container is crashing. The node is the reason. Here is how to prove it in 90 seconds.

KV
Koti Vellanki08 Apr 20263 min read
kubernetesdebuggingstorage
Kubernetes Disk I/O Errors: Pod Symptoms, Node Root Cause

4:02 AM. An ingest service was dying on exactly one node out of twenty-three. The pod spec was identical everywhere. The image was identical. The config was identical. Same Helm release, same values. On twenty-two nodes it ran fine. On worker-14 it crashed within three seconds every single time, with a cryptic no such file or directory coming out of a path that existed on every other host. Twenty minutes of reading Go stack traces and diffing node labels got me nowhere. The moment I ran kubectl describe pod and looked at the volume block, the pattern snapped into place: we had a hostPath volume pointing at a directory that had been wiped off worker-14 during a disk replacement two days earlier and never restored.

The scenario

DAY 20 · STORAGE · DISK I/O THROTTLE

The pod writes are stalling. The EBS burst credits ran out.

The EBS gp3 volume behind this pod spent its burst credit pool writing at 16 000 IOPS. Now it is throttled back to the 3 000 IOPS baseline. Writes queue up, the Linux block layer detects the stall after 120 seconds, and dmesg fills with 'blocked for more than 120 seconds'. The pod is not broken. The disk is throttled.

FIGURE20 / 35
EBS burst credits exhausted — pod write latency spikes, Linux block layer stallsA pod's EBS gp3 volume has depleted its burst credits and is now limited to the 3000 IOPS baseline. Write latency is 4200ms. The Linux block layer logs a hung-task warning after 120 seconds of blocked I/O.KUBERNETES CLUSTERcluster · v1.30POD · default nswrite latency:4200msvolume: ebs-pvctype: gp3writes stalling1disk writequeuedEBS BURST CREDITSgp3 baseline 3000 IOPSburst iops: 16000baseline iops: 3000credits remainingcredits: 8% remainingcurrent iops:3000 / 16000 burstthrottled to baseline2throttle→ stalled writesLINUX BLOCK LAYERdmesgtask kworker: blockedfor >120sio wait:87%hung_task_timeout_secs:120 (default)→ application stallediostat -x confirms3
1

The pod is healthy — the disk is throttled

Write latency of 4 200 ms is not a code bug. The pod's EBS gp3 volume started with enough burst credits to sustain 16 000 IOPS. Once the burst pool drained, AWS throttled it to the 3 000 IOPS baseline. Writes queue up inside the kernel rather than failing. Run kubectl describe pod first, then look at the node with iostat -x 1 5.

2

gp3 burst credits are finite — they refill slowly

AWS EBS gp3 volumes earn burst credits at 3 000 IOPS and spend them whenever the volume writes above that rate. At 8% remaining, the next sustained write burst will exhaust the pool in minutes. The fix is to provision the volume with a higher baseline IOPS ( --iops 6000 for instance) or to split the write load across multiple volumes.

3

dmesg is the first place to look, not the pod logs

dmesg | grep -i hung shows the hung task entries within seconds of the stall beginning. The Linux kernel default hung_task_timeout_secs is 120 seconds. At 87% io wait the node scheduler is spending almost all of its time waiting on the disk, not running user processes. The application is not crashing — it is simply waiting for writes to return.

Kubernetes
EBS burst credits
Throttled path
Block layer
◆ koti.dev / runbook
An EBS gp3 volume exhausts its burst credits, write latency spikes to 4 200 ms, and the Linux block layer logs a hung-task warning.
A pod inside a Kubernetes cluster shows write latency of 4200ms. The EBS burst credit bar shows only 8% remaining. The Linux block layer logs 'task kworker: blocked for more than 120s' and io wait at 87%.
kind v0.22.0 (note: burst-credit behavior is a cloud disk thing — verified concept against AWS EBS docs and a real EKS cluster)

The repo reproduces this cleanly with a hostPath that points at a path guaranteed not to exist.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/disk-io-errors ls
bash

issue.yaml mounts hostPath: /nonexistent-path with type: Directory. The kubelet will try to stat that directory on the node and fail.

Reproduce the issue

bash
kubectl apply -f issue.yaml kubectl get pod disk-io-error-pod
bash
plaintext
NAME READY STATUS RESTARTS AGE disk-io-error-pod 0/1 ContainerCreating 0 30s

Stuck in ContainerCreating. No restarts, no crashloop, just waiting. That is the first tell: a hostPath problem does not crash the container, it prevents the container from ever starting.

Debug the hard way

bash
kubectl describe pod disk-io-error-pod
bash
plaintext
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 18s (x3 over 35s) kubelet MountVolume.SetUp failed for volume "data-volume" : hostPath type check failed: /nonexistent-path is not a directory

The error names the path, names the check that failed, and tells you exactly which node component emitted it. The kubelet did a type: Directory validation, the directory did not exist, and the mount step aborted.

bash
kubectl get pod disk-io-error-pod -o jsonpath='{.spec.nodeName}{"\n"}' # worker-14
bash

Now you know which node to look at. If you can, SSH there:

bash
ssh worker-14 "ls -la /nonexistent-path" # ls: cannot access '/nonexistent-path': No such file or directory
bash

Confirmed. The pod-level error was a faithful report of a node-level reality. Nothing was wrong with the container, the image, or the cluster control plane.

Why this happens

Any pod that mounts node-local storage (hostPath, local PV, some CSI drivers with pre-provisioned volumes) couples the pod's health to the node's disk state. Kubernetes does not replicate hostPath directories between nodes. It does not create them on demand. If you write path: /data/logs in your spec, that directory has to exist on every node where the pod might land, or the mount fails, or you use a nodeSelector to pin the pod to a node where it does exist.

The reason these bugs feel mysterious is that kubectl presents everything from the pod's point of view, but the root cause lives one layer down on the node. The pod looks broken when actually the filesystem under it is missing. Storage hardware failures, disk replacements, rebuilds, and manual cleanups all leave nodes in slightly different states over time, and any workload that assumes filesystem uniformity across nodes is one bad reboot away from a FailedMount.

The lesson I took from the worker-14 outage: if your pod events mention mounts and your pod works on some nodes but not others, stop reading application logs. Walk to the node.

The fix

The repo's fix swaps the hostPath for an emptyDir, which removes the dependency on any specific node filesystem:

bash
kubectl delete -f issue.yaml kubectl apply -f fix.yaml
bash
yaml
volumes: - name: data-volume emptyDir: {}
yaml
bash
kubectl get pod disk-io-error-fixed-pod # disk-io-error-fixed-pod 1/1 Running 0 10s
bash

For a real workload that needs persistence, replace hostPath with a proper PVC backed by a CSI driver. HostPath is fine for system daemons and debugging. For application state, it is a trap.

The lesson

  1. Mount errors live in pod events, not pod logs. kubectl describe pod is the first command, not the fourth.
  2. HostPath and local-PV failures are node-local. Always resolve the pod's nodeName before you start guessing.
  3. If you need persistence, use a PVC backed by a real CSI driver. HostPath is a debugging tool, not a storage strategy.

Day 20 of 35. Tomorrow, the eviction that hits your pod because a completely different pod filled the node's disk.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.