CrashLoopBackOff from Tight Memory Limits: The 2-Minute Fix

2:14 AM. The pager said "payments-api RESTARTS=47." I rolled over, opened my laptop, and watched a pod get born and killed in perfect rhythm. Every 30 seconds: Created, Running, Terminated, Created. kubectl apply from the deploy pipeline had come back green an hour ago. The pod was, technically, created. It just kept getting killed a second later. The RESTARTS counter was at 51 by the time I finished typing kubectl describe. I had seen this shape before. A limit set by somebody who never ran the workload under real traffic, now meeting real traffic at 2 in the morning.

The scenario

◆ DAY 15 · RESOURCES · QUOTA ADMISSION

The pod was never created. The API server rejected it.

A namespace has a ResourceQuota capping limits.cpu at 4. A new pod requests limits.cpu: 8 — twice the namespace ceiling. The API server's quota admission plugin catches this before the object reaches etcd. The pod never exists. The error appears in kubectl output, not in pod events.

FIGURE15 / 35

The pod never reaches the scheduler — look at kubectl output, not pod events

When ResourceQuota blocks a create, the object is never written to etcd. There is no pod to describe, no events to read. The error lives in the exit code and stderr ofkubectl apply. Check pipeline logs or run the apply again — the message is immediate and explicit.

ResourceQuota is enforced at the API server, before scheduling

The quota admission plugin runs as part of the API server admission chain, after authentication and authorization but before the object persists. Check the current quota state with kubectl describe resourcequota -n default to see how much of each resource is used and what the ceiling is. The limits.cpu field counts the sum of all container limits in the namespace.

Either raise the quota or lower the pod's request

Two paths: patch the ResourceQuota object to raise the ceiling (kubectl edit resourcequota -n default), or reduce the pod's limits.cpu to fit within what remains. The right answer depends on whether the namespace quota is protecting other tenants from resource exhaustion — do not blindly raise it without checking who else shares the namespace.

kubectl apply submits a pod requesting limits.cpu: 8. The API server's ResourceQuota admission plugin rejects it with 403 Forbidden before the object is written to etcd.

A kubectl apply command sends a pod spec with limits.cpu: 8 to the kube-apiserver. The quota admission plugin checks the namespace ResourceQuota which caps limits.cpu at 4. The request is rejected with HTTP 403 Forbidden and the message 'exceeded quota: limits.cpu, requested: 8, used: 0, limited: 4'.

ResourceQuota v1 — kubectl explain resourcequota.spec · Quota admission plugin — kube-apiserver admission controllers list, --enable-admission-plugins=ResourceQuota · kind v0.22.0, Kubernetes 1.30.0 — kubectl apply with an over-quota pod returns "exceeded quota" immediately

This one lives in the troubleshooting repo. Clone it, apply the broken manifest, and you can reproduce the exact loop I was staring at.

bash

git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/failed-resource-limits
ls

bash

You will see issue.yaml, fix.yaml, and a short description.md. The issue manifest runs polinux/stress asking for 64M of memory under a 32Mi limit. Twice the headroom it is allowed. Perfect CrashLoop material.

Reproduce the issue

bash

kubectl apply -f issue.yaml
# pod/failed-resource-limits-pod created

kubectl get pods -w

bash

Wait sixty seconds and the RESTARTS column starts climbing like a stopwatch.

plaintext

NAME                         READY   STATUS             RESTARTS      AGE
failed-resource-limits-pod   0/1     CrashLoopBackOff   5 (12s ago)   2m

Five restarts in two minutes. The pod is not flaky. It is a dead machine being revived over and over.

Debug the hard way

bash

kubectl describe pod failed-resource-limits-pod

bash

Buried in the events:

plaintext

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
Limits:
  memory:  32Mi
Command: stress --vm 1 --vm-bytes 64M

Two fields, one answer. The limit is 32Mi. The workload wants 64M.

bash

kubectl logs failed-resource-limits-pod --previous

bash

plaintext

stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [1] (415) <-- worker 7 got signal 9
stress: FAIL: [1] (451) failed run completed in 0s

Signal 9 is the kernel saying "I killed this on purpose." No application bug. No race condition. Just cgroups doing their job.

bash

kubectl get pod failed-resource-limits-pod -o jsonpath='{.status.containerStatuses[0].restartCount}{"\n"}'
# 11

bash

Why this happens

Memory limits in Kubernetes are not soft targets. They are hard ceilings enforced by the Linux kernel through cgroups. When your container tries to allocate past its limit, the kernel does not send a warning or a graceful shutdown. It fires the OOM killer, the process dies with exit code 137, and the kubelet dutifully restarts the pod because restartPolicy defaults to Always. That loop runs forever. The backoff caps at five minutes, so you get one dead pod every five minutes until a human notices.

The mental model I wish somebody had drawn for me in year one: a pod with a limit below its actual memory need is not a bug. It is a permanent kill switch. CrashLoopBackOff is not a transient state here, it is the steady state. No amount of patience or retries will fix it because nothing about the workload is going to change between attempts.

The lesson from the field is that limits are a contract with the kernel, not a guideline for the scheduler. Write the contract wrong and the kernel enforces it exactly.

The fix

bash

kubectl delete -f issue.yaml
kubectl apply -f fix.yaml

bash

The diff is two lines:

yaml

command: ["stress", "--vm", "1", "--vm-bytes", "50M"]
resources:
  limits:
    memory: "256Mi"

yaml

256Mi for a 50M workload. That looks wasteful until you remember that memory pages are free, outages are not.

bash

kubectl get pod failed-resource-limits-fixed-pod
# failed-resource-limits-fixed-pod   1/1   Running   0   1m

bash

Zero restarts. Steady state.

The lesson

CrashLoopBackOff plus OOMKilled equals a memory limit below real usage. It will not self-heal. Stop waiting.
The RESTARTS counter is the most honest metric in Kubernetes. A climbing number means something is fundamentally wrong, not transiently wrong.
Set memory limits to peak observed usage times 1.5, minimum. Headroom is the cheapest insurance you can buy.

Day 15 of 35. Tomorrow we go one layer deeper, into the cgroup itself, where the kernel makes the decisions Kubernetes only reports.

The scenario

The pod never reaches the scheduler — look at kubectl output, not pod events

ResourceQuota is enforced at the API server, before scheduling

Either raise the quota or lower the pod's request

Reproduce the issue

Debug the hard way

Why this happens

The fix

The lesson

Get the next post in your inbox.