3:40 AM, a Tuesday I will not forget. A batch processor was dying every four minutes in production. kubectl top pod reported 38Mi average. The alert from our APM said peak RSS was 42Mi. The limit on the pod was 50Mi. By every number I could see, we had headroom. But the pod kept dying with exit code 137. I spent twenty minutes reading Go profiles before I remembered that kubectl top samples every thirty seconds and the kernel checks every single page fault. Somewhere in the gap between those two clocks, the container was touching 60Mi for half a second and the kernel was doing what kernels do. I had been debugging the wrong layer the whole time.
The scenario
Same repo, different folder. This one exists to make the gap between pod-level metrics and cgroup-level enforcement painfully visible.
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/cgroup-issues
lsissue.yaml runs stress with 100M of allocations against a 50Mi cgroup ceiling. You cannot win that fight.
Reproduce the issue
kubectl apply -f issue.yaml
kubectl get podsNAME READY STATUS RESTARTS AGE
cgroup-issue-pod 0/1 CrashLoopBackOff 4 (18s ago) 90sThe restart count climbs while the metrics dashboards stay flat. That is the cgroup loop. Fast, clean, invisible to anything that samples on a poll.
Debug the hard way
kubectl describe pod cgroup-issue-pod | grep -A5 "Last State"Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 04 Apr 2026 03:41:02 +0000
Finished: Mon, 04 Apr 2026 03:41:03 +0000One second of life. Born, killed, done.
kubectl get pod cgroup-issue-pod -o jsonpath='{.spec.containers[0].resources}{"\n"}'{"limits":{"memory":"50Mi"}}If you can SSH to the node, the evidence is sharper:
dmesg -T | grep -i "killed process"
# [Mon Apr 4 03:41:03 2026] Memory cgroup out of memory: Killed process 18422 (stress) total-vm:106048kB, anon-rss:51200kBAnon-rss 51200kB against a 50Mi limit. The kernel saw the overshoot at the exact microsecond it happened.
Why this happens
Kubernetes does not enforce memory limits. The Linux kernel does, through cgroups v1 or v2, depending on your distro. When you write limits.memory: 50Mi in a pod spec, the kubelet translates that into a cgroup file on the host, something like /sys/fs/cgroup/memory/.../memory.limit_in_bytes. From that moment on, every page fault inside the container goes through a kernel check. If the total charged memory exceeds the limit by even one page, the OOM killer fires. There is no poll, no sample, no averaging. It is instantaneous.
kubectl top pod and your Prometheus dashboards work very differently. They scrape cAdvisor on a schedule, usually every 15 to 30 seconds, and they report whatever they saw at those moments. Short spikes between samples are invisible to them. A pod that touches 120Mi for 200 milliseconds looks identical to a pod that stays at 38Mi forever, because the sample never landed during the spike.
Once you see the two layers clearly, cgroup OOMs stop being a mystery. Your graphs are not lying, they are just looking away at the wrong moment. The kernel never looks away.
The fix
kubectl delete -f issue.yaml
kubectl apply -f fix.yamlcommand: ["stress", "--vm", "1", "--vm-bytes", "50M"]
resources:
limits:
memory: "256Mi"Same workload numbers as Day 15, different point. Here we are not just giving the app more headroom, we are giving it enough headroom that short allocation spikes during GC or buffer resize cannot punch through the cgroup ceiling.
kubectl get pod cgroup-issue-fixed-pod
# cgroup-issue-fixed-pod 1/1 Running 0 2mThe lesson
kubectl topsamples, the kernel does not. If the two disagree, trust the kernel.- Memory limits are enforced at the page-fault layer. Spikes shorter than your scrape interval can still kill you.
- When an OOM happens that your graphs cannot explain, walk down one layer to dmesg or cgroup stats. The evidence is always there, you just need to open the right file.
Day 16 of 35. Tomorrow we jump from compute to storage, starting with a volumeMount that points at a volume that does not exist.
