koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 04 / 35

ImagePullBackOff in Kubernetes: The Timer on Top of the Error

Same cause as ErrImagePull, plus a backoff clock that makes the cluster look broken when it is just being polite.

KV
Koti Vellanki23 Mar 20265 min read
kubernetesdebuggingimage
ImagePullBackOff in Kubernetes: The Timer on Top of the Error

2:30 AM and I am watching a CI cluster slowly drown. Every new pod comes up in ImagePullBackOff. The images exist. I can pull them from my laptop in two seconds. Our registry is up. Monitoring is green everywhere except the pod list, which looks like someone unplugged the internet. I check the Failed events and see toomanyrequests: You have reached your pull rate limit. Docker Hub, anonymous pulls, 100 per 6 hours, and our CI pipeline was burning them faster than we could count. The worst part is the backoff. By the time I figured out the cause, new pods were waiting five full minutes between retries and I was powerless to rush them.

The scenario

From the troubleshoot-kubernetes-like-a-pro repo. You are going to reproduce ImagePullBackOff, watch the timer grow, and learn to separate the cause from the cooldown.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/image-pull-backoff ls

description.md, issue.yaml, fix.yaml. Assumes you have a cluster running from Day 0.

Reproduce the issue

bash
kubectl apply -f issue.yaml kubectl get pods

Wait a couple of minutes and run it again. You will see the state flip between ErrImagePull and ImagePullBackOff depending on whether the kubelet is mid-attempt or mid-cooldown.

plaintext
NAME READY STATUS RESTARTS AGE image-pull-backoff-pod 0/1 ImagePullBackOff 0 2m15s
Terminal: kubectl apply -f issue.yaml creating image-pull-backoff-pod, kubectl get po showing the pod with status ErrImagePull, READY 0/1, RESTARTS 0, AGE 6s
Apply issue.yaml and kubectl get po catches the pod in ErrImagePull before the backoff timer kicks in. Wait another minute and the same command will show ImagePullBackOff — same cause, different status.

Debug the hard way

Describe it and look at the events block.

bash
kubectl describe pod image-pull-backoff-pod
plaintext
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 45s (x4 over 2m15s) kubelet Pulling image "non-existent-image" Warning Failed 44s (x4 over 2m14s) kubelet Failed to pull image "non-existent-image": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/library/non-existent-image:latest": failed to resolve reference "docker.io/library/non-existent-image:latest": docker.io/library/non-existent-image:latest: not found Warning Failed 44s (x4 over 2m14s) kubelet Error: ErrImagePull Normal BackOff 30s (x5 over 2m) kubelet Back-off pulling image "non-existent-image" Warning Failed 30s (x5 over 2m) kubelet Error: ImagePullBackOff

Read the sequence. Pulling fires. Failed fires with the real reason. ErrImagePull shows up as the status. Then BackOff starts, and ImagePullBackOff becomes the new status. Same cause, different status because the kubelet is now in a cooldown window.

Terminal: kubectl describe pod image-pull-backoff-pod showing pod metadata, Image: non-existent-image, Container State Waiting with reason ImagePullBackOff, Restart Count 0, Requests and Limits, and the Conditions block
kubectl describe — the container state is Waiting with reason ImagePullBackOff, the image reference is visible in the pod spec, and Restart Count is still 0 because the container has never existed.
Terminal: bottom half of kubectl describe showing the Events table with Normal Scheduled, Normal Pulling, Warning Failed with pull access denied and repository does not exist message, Warning Failed ErrImagePull, and Warning Failed ImagePullBackOff entries
Scroll to the Events table. Warning Failed carries the real message — 'pull access denied, repository does not exist or may require authorization' — and BackOff is layered on top. The status column only told you the cooldown, not the cause.

The filter that saves you time:

bash
kubectl get events --field-selector involvedObject.name=image-pull-backoff-pod,reason=Failed

Why this happens

The kubelet has an exponential backoff for image pulls, just like it does for container restarts. First failure, retry in 10 seconds. Second, 20. Then 40, 80, 160, capped at 5 minutes. Each attempt goes through the same pull chain as Day 3: resolve host, TCP, handshake, auth, manifest, layers. If any step fails, you get an ErrImagePull event, and if it keeps failing, the cooldown grows and the pod's status reported to kubectl get pods becomes ImagePullBackOff.

So ImagePullBackOff is not a distinct error. It is ErrImagePull plus a timer. The real cause is always in the Failed event message, and the four common shapes are worth memorising:

  • Typo or missing tag, fix the image: field and pin versions.
  • Private registry auth, add a dockerconfigjson secret and reference it via imagePullSecrets:.
  • Rate limits, especially Docker Hub anonymous pulls, authenticate or mirror to ECR, GCR, ACR, or your own registry.
  • Blocked egress, check NAT gateway, security groups, and node firewall rules.

The fix

bash
kubectl apply -f fix.yaml kubectl get pods

The broken pod referenced an image that does not exist:

yaml
image: non-existent-image

The fix points at a real image and gives the container a long-running command so you can watch it stay alive:

yaml
image: busybox command: ["sh", "-c", "sleep 3600"]
plaintext
NAME READY STATUS RESTARTS AGE image-pull-backoff-fixed-pod 1/1 Running 0 3s
Terminal: cat issue.yaml shows image non-existent-image with comment 'Invalid image causing pull failure', cat fix.yaml shows image busybox with command sh -c sleep 3600, kubectl apply -f fix.yaml creates image-pull-backoff-fixed-pod, kubectl get po shows the fixed pod 1/1 Running and the original still ImagePullBackOff with 2m43s age
Diff the manifests and the real problem is the image field. The fix replaces non-existent-image with busybox plus a sleep command so the container stays alive. One apply, the new pod is Running, the original is still timing out on its backoff.

The easiest way — with Kubilitics

The Pods view pulls the same answer out of the cluster one click at a time. After applying both manifests, both pods are visible in the table with their own status badges — no describe, no scroll, no event filter.

Kubilitics Pods view showing 2 pods total, 1 running, 0 failed. image-pull-backoff-fixed-pod with a green Running badge and READY 1/1, image-pull-backoff-pod with an ImagePullBackOff badge and READY 0/1
Kubilitics Pods view after the fix lands. The fixed pod is green, the original is still red, and the summary row shows the running count tick up. You never had to read an event table to know which pod is which.

The lesson

  1. ImagePullBackOff is ErrImagePull with a cooldown. Read the underlying Failed event, not the status column.
  2. Pin tags, mirror registries, and authenticate even to public registries. If your CI pulls hundreds of images a day, you will hit rate limits at exactly the worst possible moment.
  3. reason=Failed is the one field selector that cuts through backoff noise. Put it in your aliases.

Day 4 of 35 — tomorrow the image pulls, the container runs, and the Service still refuses to send it a single request.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.