ImagePullBackOff in Kubernetes: The Timer on Top of the Error

2:30 AM and I am watching a CI cluster slowly drown. Every new pod comes up in ImagePullBackOff. The images exist. I can pull them from my laptop in two seconds. Our registry is up. Monitoring is green everywhere except the pod list, which looks like someone unplugged the internet. I check the Failed events and see toomanyrequests: You have reached your pull rate limit. Docker Hub, anonymous pulls, 100 per 6 hours, and our CI pipeline was burning them faster than we could count. The worst part is the backoff. By the time I figured out the cause, new pods were waiting five full minutes between retries and I was powerless to rush them.

The scenario

◆ DAY 4 · APP · IMAGE PULL

The credentials are fine. The tag does not exist.

kubelet asks containerd to pull the image. The registry returns HTTP 404 — the tag was never pushed. kubelet retries with exponential backoff; after several failures the pod transitions from ErrImagePull to ImagePullBackOff.

FIGURE04 / 35

The pod references a tag that was never pushed

The image name is correct and the registry is reachable. Only the tag does-not-exist is wrong — it was never pushed. Authentication succeeds; the manifest lookup fails.

The registry returns 404 on every attempt

The OCI distribution spec returns HTTP 404 with body manifest unknown when the requested tag or digest does not exist. This is not a transient error — retrying will not help until the tag is pushed.

ImagePullBackOff is the retry governor, not a new error

After each ErrImagePull, kubelet doubles the wait: 10s → 20s → 40s → 80s… up to a 5-minute ceiling. The pod status flips to ImagePullBackOff while it is waiting between retries.

A pod references a tag that was never pushed. The registry returns 404 on every attempt and kubelet backs off exponentially.

A pod inside a Kubernetes cluster references an image tag that does not exist on the registry. containerd sends a manifest GET and receives HTTP 404. kubelet retries with exponential backoff at 10s, 20s, 40s, and eventually 80s before settling at the 5-minute ceiling. The pod status transitions from ErrImagePull to ImagePullBackOff.

443/tcp (registry HTTPS) · 198.51.100.4 (RFC 5737 TEST-NET-2, documentation only) · Pod status reason "ImagePullBackOff" — set by kubelet image manager after repeated ErrImagePull · kind v0.22.0, Kubernetes 1.30.0 — kubectl get events confirms BackOff transitions

From the troubleshoot-kubernetes-like-a-pro repo. You are going to reproduce ImagePullBackOff, watch the timer grow, and learn to separate the cause from the cooldown.

bash

git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/image-pull-backoff
ls

bash

description.md, issue.yaml, fix.yaml. Assumes you have a cluster running from Day 0.

Reproduce the issue

bash

kubectl apply -f issue.yaml
kubectl get pods

bash

Wait a couple of minutes and run it again. You will see the state flip between ErrImagePull and ImagePullBackOff depending on whether the kubelet is mid-attempt or mid-cooldown.

plaintext

NAME                     READY   STATUS             RESTARTS   AGE
image-pull-backoff-pod   0/1     ImagePullBackOff   0          2m15s

Terminal: kubectl apply -f issue.yaml creating image-pull-backoff-pod, kubectl get po showing the pod with status ErrImagePull, READY 0/1, RESTARTS 0, AGE 6s — Apply issue.yaml and kubectl get po catches the pod in ErrImagePull before the backoff timer kicks in. Wait another minute and the same command will show ImagePullBackOff — same cause, different status.

Debug the hard way

Describe it and look at the events block.

bash

kubectl describe pod image-pull-backoff-pod

bash

plaintext

Events:
  Type     Reason   Age                  From     Message
  ----     ------   ----                 ----     -------
  Normal   Pulling  45s (x4 over 2m15s)  kubelet  Pulling image "non-existent-image"
  Warning  Failed   44s (x4 over 2m14s)  kubelet  Failed to pull image "non-existent-image":
                                                  rpc error: code = NotFound desc =
                                                  failed to pull and unpack image
                                                  "docker.io/library/non-existent-image:latest":
                                                  failed to resolve reference
                                                  "docker.io/library/non-existent-image:latest":
                                                  docker.io/library/non-existent-image:latest:
                                                  not found
  Warning  Failed   44s (x4 over 2m14s)  kubelet  Error: ErrImagePull
  Normal   BackOff  30s (x5 over 2m)     kubelet  Back-off pulling image "non-existent-image"
  Warning  Failed   30s (x5 over 2m)     kubelet  Error: ImagePullBackOff

Read the sequence. Pulling fires. Failed fires with the real reason. ErrImagePull shows up as the status. Then BackOff starts, and ImagePullBackOff becomes the new status. Same cause, different status because the kubelet is now in a cooldown window.

Terminal: kubectl describe pod image-pull-backoff-pod showing pod metadata, Image: non-existent-image, Container State Waiting with reason ImagePullBackOff, Restart Count 0, Requests and Limits, and the Conditions block — kubectl describe — the container state is Waiting with reason ImagePullBackOff, the image reference is visible in the pod spec, and Restart Count is still 0 because the container has never existed.

Terminal: bottom half of kubectl describe showing the Events table with Normal Scheduled, Normal Pulling, Warning Failed with pull access denied and repository does not exist message, Warning Failed ErrImagePull, and Warning Failed ImagePullBackOff entries — Scroll to the Events table. Warning Failed carries the real message — 'pull access denied, repository does not exist or may require authorization' — and BackOff is layered on top. The status column only told you the cooldown, not the cause.

The filter that saves you time:

bash

kubectl get events --field-selector involvedObject.name=image-pull-backoff-pod,reason=Failed

bash

Why this happens

The kubelet has an exponential backoff for image pulls, just like it does for container restarts. First failure, retry in 10 seconds. Second, 20. Then 40, 80, 160, capped at 5 minutes. Each attempt goes through the same pull chain as Day 3: resolve host, TCP, handshake, auth, manifest, layers. If any step fails, you get an ErrImagePull event, and if it keeps failing, the cooldown grows and the pod's status reported to kubectl get pods becomes ImagePullBackOff.

So ImagePullBackOff is not a distinct error. It is ErrImagePull plus a timer. The real cause is always in the Failed event message, and the four common shapes are worth memorising:

Typo or missing tag, fix the image: field and pin versions.
Private registry auth, add a dockerconfigjson secret and reference it via imagePullSecrets:.
Rate limits, especially Docker Hub anonymous pulls, authenticate or mirror to ECR, GCR, ACR, or your own registry.
Blocked egress, check NAT gateway, security groups, and node firewall rules.

The fix

bash

kubectl apply -f fix.yaml
kubectl get pods

bash

The broken pod referenced an image that does not exist:

yaml

image: non-existent-image

yaml

The fix points at a real image and gives the container a long-running command so you can watch it stay alive:

yaml

image: busybox
command: ["sh", "-c", "sleep 3600"]

yaml

plaintext

NAME                           READY   STATUS    RESTARTS   AGE
image-pull-backoff-fixed-pod   1/1     Running   0          3s

Terminal: cat issue.yaml shows image non-existent-image with comment 'Invalid image causing pull failure', cat fix.yaml shows image busybox with command sh -c sleep 3600, kubectl apply -f fix.yaml creates image-pull-backoff-fixed-pod, kubectl get po shows the fixed pod 1/1 Running and the original still ImagePullBackOff with 2m43s age — Diff the manifests and the real problem is the image field. The fix replaces non-existent-image with busybox plus a sleep command so the container stays alive. One apply, the new pod is Running, the original is still timing out on its backoff.

The easiest way — with Kubilitics

The Pods view pulls the same answer out of the cluster one click at a time. After applying both manifests, both pods are visible in the table with their own status badges — no describe, no scroll, no event filter.

Kubilitics Pods view showing 2 pods total, 1 running, 0 failed. image-pull-backoff-fixed-pod with a green Running badge and READY 1/1, image-pull-backoff-pod with an ImagePullBackOff badge and READY 0/1 — Kubilitics Pods view after the fix lands. The fixed pod is green, the original is still red, and the summary row shows the running count tick up. You never had to read an event table to know which pod is which.

The lesson

ImagePullBackOff is ErrImagePull with a cooldown. Read the underlying Failed event, not the status column.
Pin tags, mirror registries, and authenticate even to public registries. If your CI pulls hundreds of images a day, you will hit rate limits at exactly the worst possible moment.
reason=Failed is the one field selector that cuts through backoff noise. Put it in your aliases.

Day 4 of 35 — tomorrow the image pulls, the container runs, and the Service still refuses to send it a single request.

The scenario

The pod references a tag that was never pushed

The registry returns 404 on every attempt

ImagePullBackOff is the retry governor, not a new error

Reproduce the issue

Debug the hard way

Why this happens

The fix

The easiest way — with Kubilitics

The lesson

Get the next post in your inbox.