koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 03 / 35

ErrImagePull in Kubernetes: Typo, Auth, or Network?

One status, three completely different fixes. The describe event tells you which one you are actually looking at.

KV
Koti Vellanki22 Mar 20265 min read
kubernetesdebuggingimage
ErrImagePull in Kubernetes: Typo, Auth, or Network?

Friday evening, 9:40 PM. The release pipeline finished, the deployment updated, every new pod is in ErrImagePull. I assume it is a typo because it is always a typo. I diff the deployment against the previous revision. No typo. I check the registry in a browser. The tag is there. I waste ten minutes on pull secrets before I finally read the actual Failed event and see no such host. It is not auth. It is not a typo. The registry's DNS record rotated an hour ago and our node pool still has the old one cached. Three different causes, one status column, and I just burned ten minutes on the wrong hypothesis because I trusted the error name instead of the error message.

The scenario

From my troubleshoot-kubernetes-like-a-pro repo. Reproduce the failure on your own cluster and practice reading the event line that actually matters.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/image-pull-error ls

description.md, issue.yaml, fix.yaml. Assumes you have a cluster running from Day 0.

Reproduce the issue

bash
kubectl apply -f issue.yaml kubectl get pods
plaintext
NAME READY STATUS RESTARTS AGE image-pull-error-pod 0/1 ErrImagePull 0 18s

ErrImagePull, zero restarts, because this is not a restart loop. The container has never existed.

Terminal: cd into scenarios/image-pull-error, ls the folder, kubectl apply -f issue.yaml creates image-pull-error-pod, kubectl get po shows the pod with status ErrImagePull, READY 0/1, RESTARTS 0, AGE 3s
Apply issue.yaml and kubectl get po catches the pod in ErrImagePull on the first try. Zero restarts — this is not a restart loop, the container has never existed.

Debug the hard way

Logs are always the first instinct.

bash
kubectl logs image-pull-error-pod
plaintext
Error from server (BadRequest): container "nginx" in pod "image-pull-error-pod" is waiting to start: trying and failing to pull image

"Trying and failing." Thanks.

Terminal: kubectl logs image-pull-error-pod returns an error saying the container is waiting to start because the kubelet is trying and failing to pull image. Then cat issue.yaml shows the broken image reference private-registry.example.com/nginx:v999, followed by cat fix.yaml showing the working image reference nginx:alpine
Logs return an unhelpful 'waiting to start' message — the container never ran, so there is nothing to print. Diff the two manifests and the real problem is in the image field: the broken spec points at a private registry and a tag that does not exist.

Describe.

bash
kubectl describe pod image-pull-error-pod
plaintext
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 15s (x2 over 30s) kubelet Pulling image "private-registry.example.com/nginx:v999" Warning Failed 14s (x2 over 29s) kubelet Failed to pull image "private-registry.example.com/nginx:v999": rpc error: code = Unknown desc = failed to resolve reference: failed to do request: Head "https://private-registry.example.com/v2/nginx/manifests/v999": dial tcp: lookup private-registry.example.com: no such host Warning Failed 14s (x2 over 29s) kubelet Error: ErrImagePull

The magic line is dial tcp: lookup ... no such host. That is the actual problem. Not a pull secret. Not a typo. The kubelet could not even resolve the registry hostname, which means it never got to authentication. A pull secret would not help you here no matter how correct it was.

Filter events directly and skip the wall of describe output:

bash
kubectl get events --field-selector involvedObject.name=image-pull-error-pod --sort-by='.lastTimestamp'

Why this happens

The kubelet pulls images on the node. It walks through a chain: resolve the registry hostname, open a TCP connection, do the HTTPS handshake, send an authenticated manifest request, download the layers. Each step has a distinct failure mode, and the Failed event message tells you which step blew up.

  • no such host or dial tcp: i/o timeout means the lookup or the TCP connection failed. Network or DNS, nothing to do with auth.
  • manifest unknown or not found means you reached the registry but the tag does not exist. Typo or tag drift.
  • 401 Unauthorized or pull access denied means authentication failed. Missing or expired imagePullSecrets.

The status column says ErrImagePull for all three. The fix is wildly different for each. Read the event, not the status.

The fix

bash
kubectl apply -f fix.yaml kubectl get pods

The key change is the image reference. The broken spec points at a fake registry and a tag that does not exist:

yaml
image: private-registry.example.com/nginx:v999

The fix points at a real public image with a real tag:

yaml
image: nginx:alpine
plaintext
NAME READY STATUS RESTARTS AGE image-pull-error-fixed-pod 1/1 Running 0 4s
Terminal: kubectl apply -f fix.yaml creating image-pull-error-fixed-pod, kubectl get pods showing image-pull-error-fixed-pod with status Running 1/1 and the original image-pull-error-pod still showing ImagePullBackOff
Apply fix.yaml. The new pod is Running in seconds because nginx:alpine is a real image the kubelet can reach. The original is still ImagePullBackOff — the kubelet has stopped trying and is waiting on its exponential backoff.

For a real private registry, the fix is different depending on which event line you saw. Fix the tag, add an imagePullSecrets: of type kubernetes.io/dockerconfigjson, or unblock egress from your nodes to the registry host.

The easiest way — with Kubilitics

The same debug surfaces without reading event tables. Open the Pods view after applying both manifests and you see the broken and fixed pods side by side, each badged with its own status.

Kubilitics Pods view showing 2 pods and summary cards — 1 running, 0 failed, 0 pending. image-pull-error-fixed-pod shows a green Running badge and READY 1/1. image-pull-error-pod shows an ImagePullBackOff badge and READY 0/1
Kubilitics Pods view with both pods visible after the fix lands. Running and ImagePullBackOff sit next to each other, the summary row shows the running count jump, the visual diff between broken and fixed is one scan of the table.

The lesson

  1. ErrImagePull is one status for three different failures. Always read the Failed event message before you touch the fix.
  2. kubectl get events --field-selector involvedObject.name=POD is the fastest path to the real error string. Alias it.
  3. DNS lookup fails before auth ever runs. If you see no such host, stop thinking about pull secrets entirely.

Day 3 of 35 — tomorrow the error is almost identical, but the kubelet has stopped trying.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.