ErrImagePull in Kubernetes: Typo, Auth, or Network?

Friday evening, 9:40 PM. The release pipeline finished, the deployment updated, every new pod is in ErrImagePull. I assume it is a typo because it is always a typo. I diff the deployment against the previous revision. No typo. I check the registry in a browser. The tag is there. I waste ten minutes on pull secrets before I finally read the actual Failed event and see no such host. It is not auth. It is not a typo. The registry's DNS record rotated an hour ago and our node pool still has the old one cached. Three different causes, one status column, and I just burned ten minutes on the wrong hypothesis because I trusted the error name instead of the error message.

The scenario

◆ DAY 3 · APP · IMAGE PULL

The image name is right. The credentials are missing.

kubelet asks containerd to pull the image. containerd contacts the private registry. The registry returns HTTP 401 because no Authorization header was sent — there is no imagePullSecret wired to this pod.

FIGURE03 / 35

The pod references a private image

The spec sets image: private.registry.example/api:1.4 but has no imagePullSecrets field. The image name is valid — the credentials are simply absent.

The registry returns 401 Unauthorized

The OCI distribution spec requires a GET /v2/{name}/manifests/{ref} call to authenticate. Without an Authorization header the registry rejects the request immediately.

kubelet marks the pod ErrImagePull

After the 401, the kubelet image manager records the failure and sets reason: ErrImagePull on the pod status. Fix: create a Secret and reference it in spec.imagePullSecrets.

A pod references a private registry image but has no imagePullSecret — the registry returns 401 and kubelet marks the pod ErrImagePull.

A pod inside a Kubernetes cluster references an image on a private registry. kubelet passes the pull request to containerd which sends a GET to the registry. The registry returns HTTP 401 Unauthorized because no credentials were provided. kubelet marks the pod status as ErrImagePull.

443/tcp (registry HTTPS) · 198.51.100.4 (RFC 5737 TEST-NET-2, documentation only) · pod.spec.imagePullSecrets — kubectl explain pod.spec.imagePullSecrets · Pod status reason "ErrImagePull" — set by kubelet image manager · kind v0.22.0, Kubernetes 1.30.0, containerd 1.7 — verified against a kind cluster pulling from a fake private registry returning 401

From my troubleshoot-kubernetes-like-a-pro repo. Reproduce the failure on your own cluster and practice reading the event line that actually matters.

bash

git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/image-pull-error
ls

bash

description.md, issue.yaml, fix.yaml. Assumes you have a cluster running from Day 0.

Reproduce the issue

bash

kubectl apply -f issue.yaml
kubectl get pods

bash

plaintext

NAME                   READY   STATUS         RESTARTS   AGE
image-pull-error-pod   0/1     ErrImagePull   0          18s

ErrImagePull, zero restarts, because this is not a restart loop. The container has never existed.

Terminal: cd into scenarios/image-pull-error, ls the folder, kubectl apply -f issue.yaml creates image-pull-error-pod, kubectl get po shows the pod with status ErrImagePull, READY 0/1, RESTARTS 0, AGE 3s — Apply issue.yaml and kubectl get po catches the pod in ErrImagePull on the first try. Zero restarts — this is not a restart loop, the container has never existed.

Debug the hard way

Logs are always the first instinct.

bash

kubectl logs image-pull-error-pod

bash

plaintext

Error from server (BadRequest): container "nginx" in pod "image-pull-error-pod"
is waiting to start: trying and failing to pull image

"Trying and failing." Thanks.

Terminal: kubectl logs image-pull-error-pod returns an error saying the container is waiting to start because the kubelet is trying and failing to pull image. Then cat issue.yaml shows the broken image reference private-registry.example.com/nginx:v999, followed by cat fix.yaml showing the working image reference nginx:alpine — Logs return an unhelpful 'waiting to start' message — the container never ran, so there is nothing to print. Diff the two manifests and the real problem is in the image field: the broken spec points at a private registry and a tag that does not exist.

Describe.

bash

kubectl describe pod image-pull-error-pod

bash

plaintext

Events:
  Type     Reason   Age                From     Message
  ----     ------   ----               ----     -------
  Normal   Pulling  15s (x2 over 30s)  kubelet  Pulling image
                                                "private-registry.example.com/nginx:v999"
  Warning  Failed   14s (x2 over 29s)  kubelet  Failed to pull image
                                                "private-registry.example.com/nginx:v999":
                                                rpc error: code = Unknown desc =
                                                failed to resolve reference: failed to do request:
                                                Head "https://private-registry.example.com/v2/nginx/manifests/v999":
                                                dial tcp: lookup private-registry.example.com:
                                                no such host
  Warning  Failed   14s (x2 over 29s)  kubelet  Error: ErrImagePull

The magic line is dial tcp: lookup ... no such host. That is the actual problem. Not a pull secret. Not a typo. The kubelet could not even resolve the registry hostname, which means it never got to authentication. A pull secret would not help you here no matter how correct it was.

Filter events directly and skip the wall of describe output:

bash

kubectl get events --field-selector involvedObject.name=image-pull-error-pod --sort-by='.lastTimestamp'

bash

Why this happens

The kubelet pulls images on the node. It walks through a chain: resolve the registry hostname, open a TCP connection, do the HTTPS handshake, send an authenticated manifest request, download the layers. Each step has a distinct failure mode, and the Failed event message tells you which step blew up.

no such host or dial tcp: i/o timeout means the lookup or the TCP connection failed. Network or DNS, nothing to do with auth.
manifest unknown or not found means you reached the registry but the tag does not exist. Typo or tag drift.
401 Unauthorized or pull access denied means authentication failed. Missing or expired imagePullSecrets.

The status column says ErrImagePull for all three. The fix is wildly different for each. Read the event, not the status.

The fix

bash

kubectl apply -f fix.yaml
kubectl get pods

bash

The key change is the image reference. The broken spec points at a fake registry and a tag that does not exist:

yaml

image: private-registry.example.com/nginx:v999

yaml

The fix points at a real public image with a real tag:

yaml

image: nginx:alpine

yaml

plaintext

NAME                         READY   STATUS    RESTARTS   AGE
image-pull-error-fixed-pod   1/1     Running   0          4s

Terminal: kubectl apply -f fix.yaml creating image-pull-error-fixed-pod, kubectl get pods showing image-pull-error-fixed-pod with status Running 1/1 and the original image-pull-error-pod still showing ImagePullBackOff — Apply fix.yaml. The new pod is Running in seconds because nginx:alpine is a real image the kubelet can reach. The original is still ImagePullBackOff — the kubelet has stopped trying and is waiting on its exponential backoff.

For a real private registry, the fix is different depending on which event line you saw. Fix the tag, add an imagePullSecrets: of type kubernetes.io/dockerconfigjson, or unblock egress from your nodes to the registry host.

The easiest way — with Kubilitics

The same debug surfaces without reading event tables. Open the Pods view after applying both manifests and you see the broken and fixed pods side by side, each badged with its own status.

Kubilitics Pods view showing 2 pods and summary cards — 1 running, 0 failed, 0 pending. image-pull-error-fixed-pod shows a green Running badge and READY 1/1. image-pull-error-pod shows an ImagePullBackOff badge and READY 0/1 — Kubilitics Pods view with both pods visible after the fix lands. Running and ImagePullBackOff sit next to each other, the summary row shows the running count jump, the visual diff between broken and fixed is one scan of the table.

The lesson

ErrImagePull is one status for three different failures. Always read the Failed event message before you touch the fix.
kubectl get events --field-selector involvedObject.name=POD is the fastest path to the real error string. Alias it.
DNS lookup fails before auth ever runs. If you see no such host, stop thinking about pull secrets entirely.

Day 3 of 35 — tomorrow the error is almost identical, but the kubelet has stopped trying.

The scenario

The pod references a private image

The registry returns 401 Unauthorized

kubelet marks the pod ErrImagePull

Reproduce the issue

Debug the hard way

Why this happens

The fix

The easiest way — with Kubilitics

The lesson

Get the next post in your inbox.