2:14 AM. The pager said "payments-api RESTARTS=47." I rolled over, opened my laptop, and watched a pod get born and killed in perfect rhythm. Every 30 seconds: Created, Running, Terminated, Created. kubectl apply from the deploy pipeline had come back green an hour ago. The pod was, technically, created. It just kept getting killed a second later. The RESTARTS counter was at 51 by the time I finished typing kubectl describe. I had seen this shape before. A limit set by somebody who never ran the workload under real traffic, now meeting real traffic at 2 in the morning.
The scenario
This one lives in the troubleshooting repo. Clone it, apply the broken manifest, and you can reproduce the exact loop I was staring at.
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/failed-resource-limits
lsYou will see issue.yaml, fix.yaml, and a short description.md. The issue manifest runs polinux/stress asking for 64M of memory under a 32Mi limit. Twice the headroom it is allowed. Perfect CrashLoop material.
Reproduce the issue
kubectl apply -f issue.yaml
# pod/failed-resource-limits-pod created
kubectl get pods -wWait sixty seconds and the RESTARTS column starts climbing like a stopwatch.
NAME READY STATUS RESTARTS AGE
failed-resource-limits-pod 0/1 CrashLoopBackOff 5 (12s ago) 2mFive restarts in two minutes. The pod is not flaky. It is a dead machine being revived over and over.
Debug the hard way
kubectl describe pod failed-resource-limits-podBuried in the events:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Limits:
memory: 32Mi
Command: stress --vm 1 --vm-bytes 64MTwo fields, one answer. The limit is 32Mi. The workload wants 64M.
kubectl logs failed-resource-limits-pod --previousstress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [1] (415) <-- worker 7 got signal 9
stress: FAIL: [1] (451) failed run completed in 0sSignal 9 is the kernel saying "I killed this on purpose." No application bug. No race condition. Just cgroups doing their job.
kubectl get pod failed-resource-limits-pod -o jsonpath='{.status.containerStatuses[0].restartCount}{"\n"}'
# 11Why this happens
Memory limits in Kubernetes are not soft targets. They are hard ceilings enforced by the Linux kernel through cgroups. When your container tries to allocate past its limit, the kernel does not send a warning or a graceful shutdown. It fires the OOM killer, the process dies with exit code 137, and the kubelet dutifully restarts the pod because restartPolicy defaults to Always. That loop runs forever. The backoff caps at five minutes, so you get one dead pod every five minutes until a human notices.
The mental model I wish somebody had drawn for me in year one: a pod with a limit below its actual memory need is not a bug. It is a permanent kill switch. CrashLoopBackOff is not a transient state here, it is the steady state. No amount of patience or retries will fix it because nothing about the workload is going to change between attempts.
The lesson from the field is that limits are a contract with the kernel, not a guideline for the scheduler. Write the contract wrong and the kernel enforces it exactly.
The fix
kubectl delete -f issue.yaml
kubectl apply -f fix.yamlThe diff is two lines:
command: ["stress", "--vm", "1", "--vm-bytes", "50M"]
resources:
limits:
memory: "256Mi"256Mi for a 50M workload. That looks wasteful until you remember that memory pages are free, outages are not.
kubectl get pod failed-resource-limits-fixed-pod
# failed-resource-limits-fixed-pod 1/1 Running 0 1mZero restarts. Steady state.
The lesson
- CrashLoopBackOff plus OOMKilled equals a memory limit below real usage. It will not self-heal. Stop waiting.
- The RESTARTS counter is the most honest metric in Kubernetes. A climbing number means something is fundamentally wrong, not transiently wrong.
- Set memory limits to peak observed usage times 1.5, minimum. Headroom is the cheapest insurance you can buy.
Day 15 of 35. Tomorrow we go one layer deeper, into the cgroup itself, where the kernel makes the decisions Kubernetes only reports.
