koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 15 / 35

CrashLoopBackOff from Tight Memory Limits: The 2-Minute Fix

Pod created. Pod killed. Pod created. Pod killed. Welcome to the forever loop.

KV
Koti Vellanki03 Apr 20263 min read
kubernetesdebuggingresources
CrashLoopBackOff from Tight Memory Limits: The 2-Minute Fix

2:14 AM. The pager said "payments-api RESTARTS=47." I rolled over, opened my laptop, and watched a pod get born and killed in perfect rhythm. Every 30 seconds: Created, Running, Terminated, Created. kubectl apply from the deploy pipeline had come back green an hour ago. The pod was, technically, created. It just kept getting killed a second later. The RESTARTS counter was at 51 by the time I finished typing kubectl describe. I had seen this shape before. A limit set by somebody who never ran the workload under real traffic, now meeting real traffic at 2 in the morning.

The scenario

DAY 15 · RESOURCES · QUOTA ADMISSION

The pod was never created. The API server rejected it.

A namespace has a ResourceQuota capping limits.cpu at 4. A new pod requests limits.cpu: 8 — twice the namespace ceiling. The API server's quota admission plugin catches this before the object reaches etcd. The pod never exists. The error appears in kubectl output, not in pod events.

FIGURE15 / 35
ResourceQuota admission rejection — pod requesting limits.cpu: 8 rejected by kube-apiserverA kubectl apply command sends a pod spec requesting limits.cpu: 8 to the kube-apiserver. The ResourceQuota admission plugin in the API server checks the namespace quota, which caps limits.cpu at 4. The admission plugin rejects the request with HTTP 403 Forbidden. The pod never reaches etcd.kubectl applyclient requestpod.spec.containers.resources.limitscpu:8namespace: defaultPOST /api/v1/pods1POST /api/v1/podslimits.cpu: 8KUBE-APISERVERquota admissionResourceQuota checkquota: limits.cpu=4request: 8used: 0limited: 4→ exceeds quotanever reaches etcd2403→ admission denyRESPONSE403 Forbiddenerror: forbidden:exceeded quota:limits.cpu=4requested: 8used: 0limited: 4kubectl exits non-zerono pod event generated3
1

The pod never reaches the scheduler — look at kubectl output, not pod events

When ResourceQuota blocks a create, the object is never written to etcd. There is no pod to describe, no events to read. The error lives in the exit code and stderr ofkubectl apply. Check pipeline logs or run the apply again — the message is immediate and explicit.

2

ResourceQuota is enforced at the API server, before scheduling

The quota admission plugin runs as part of the API server admission chain, after authentication and authorization but before the object persists. Check the current quota state with kubectl describe resourcequota -n default to see how much of each resource is used and what the ceiling is. The limits.cpu field counts the sum of all container limits in the namespace.

3

Either raise the quota or lower the pod's request

Two paths: patch the ResourceQuota object to raise the ceiling (kubectl edit resourcequota -n default), or reduce the pod's limits.cpu to fit within what remains. The right answer depends on whether the namespace quota is protecting other tenants from resource exhaustion — do not blindly raise it without checking who else shares the namespace.

Kubernetes
Admission denied
API path
◆ koti.dev / runbook
kubectl apply submits a pod requesting limits.cpu: 8. The API server's ResourceQuota admission plugin rejects it with 403 Forbidden before the object is written to etcd.
A kubectl apply command sends a pod spec with limits.cpu: 8 to the kube-apiserver. The quota admission plugin checks the namespace ResourceQuota which caps limits.cpu at 4. The request is rejected with HTTP 403 Forbidden and the message 'exceeded quota: limits.cpu, requested: 8, used: 0, limited: 4'.
ResourceQuota v1 — kubectl explain resourcequota.spec · Quota admission plugin — kube-apiserver admission controllers list, --enable-admission-plugins=ResourceQuota · kind v0.22.0, Kubernetes 1.30.0 — kubectl apply with an over-quota pod returns "exceeded quota" immediately

This one lives in the troubleshooting repo. Clone it, apply the broken manifest, and you can reproduce the exact loop I was staring at.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/failed-resource-limits ls
bash

You will see issue.yaml, fix.yaml, and a short description.md. The issue manifest runs polinux/stress asking for 64M of memory under a 32Mi limit. Twice the headroom it is allowed. Perfect CrashLoop material.

Reproduce the issue

bash
kubectl apply -f issue.yaml # pod/failed-resource-limits-pod created kubectl get pods -w
bash

Wait sixty seconds and the RESTARTS column starts climbing like a stopwatch.

plaintext
NAME READY STATUS RESTARTS AGE failed-resource-limits-pod 0/1 CrashLoopBackOff 5 (12s ago) 2m

Five restarts in two minutes. The pod is not flaky. It is a dead machine being revived over and over.

Debug the hard way

bash
kubectl describe pod failed-resource-limits-pod
bash

Buried in the events:

plaintext
Last State: Terminated Reason: OOMKilled Exit Code: 137 Limits: memory: 32Mi Command: stress --vm 1 --vm-bytes 64M

Two fields, one answer. The limit is 32Mi. The workload wants 64M.

bash
kubectl logs failed-resource-limits-pod --previous
bash
plaintext
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd stress: FAIL: [1] (415) <-- worker 7 got signal 9 stress: FAIL: [1] (451) failed run completed in 0s

Signal 9 is the kernel saying "I killed this on purpose." No application bug. No race condition. Just cgroups doing their job.

bash
kubectl get pod failed-resource-limits-pod -o jsonpath='{.status.containerStatuses[0].restartCount}{"\n"}' # 11
bash

Why this happens

Memory limits in Kubernetes are not soft targets. They are hard ceilings enforced by the Linux kernel through cgroups. When your container tries to allocate past its limit, the kernel does not send a warning or a graceful shutdown. It fires the OOM killer, the process dies with exit code 137, and the kubelet dutifully restarts the pod because restartPolicy defaults to Always. That loop runs forever. The backoff caps at five minutes, so you get one dead pod every five minutes until a human notices.

The mental model I wish somebody had drawn for me in year one: a pod with a limit below its actual memory need is not a bug. It is a permanent kill switch. CrashLoopBackOff is not a transient state here, it is the steady state. No amount of patience or retries will fix it because nothing about the workload is going to change between attempts.

The lesson from the field is that limits are a contract with the kernel, not a guideline for the scheduler. Write the contract wrong and the kernel enforces it exactly.

The fix

bash
kubectl delete -f issue.yaml kubectl apply -f fix.yaml
bash

The diff is two lines:

yaml
command: ["stress", "--vm", "1", "--vm-bytes", "50M"] resources: limits: memory: "256Mi"
yaml

256Mi for a 50M workload. That looks wasteful until you remember that memory pages are free, outages are not.

bash
kubectl get pod failed-resource-limits-fixed-pod # failed-resource-limits-fixed-pod 1/1 Running 0 1m
bash

Zero restarts. Steady state.

The lesson

  1. CrashLoopBackOff plus OOMKilled equals a memory limit below real usage. It will not self-heal. Stop waiting.
  2. The RESTARTS counter is the most honest metric in Kubernetes. A climbing number means something is fundamentally wrong, not transiently wrong.
  3. Set memory limits to peak observed usage times 1.5, minimum. Headroom is the cheapest insurance you can buy.

Day 15 of 35. Tomorrow we go one layer deeper, into the cgroup itself, where the kernel makes the decisions Kubernetes only reports.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.