koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 09 / 35

Pod Affinity Violation in Kubernetes: The Silent Pending Trap

The scheduler will wait forever for an affinity rule that can never be satisfied, and it will never tell you out loud.

KV
Koti Vellanki28 Mar 20263 min read
kubernetesdebuggingscheduling
Pod Affinity Violation in Kubernetes: The Silent Pending Trap

2:40 AM. Different night, different cluster, same kind of Pending. This time the pod is a new nginx workload a teammate shipped in a hurry before going to bed. I pull up the Deployment, replicas desired 1, replicas available 0, and the pod has been Pending for over an hour. The twist is that the cluster has plenty of room. CPU is at 30%, memory is at 40%, every node is green on the dashboard. And yet one pod, with a 50 megabyte image, cannot find a home.

I already know it is not capacity. I already know it is not the image. So it is either taints, node selectors, or the one that always bites me at night: affinity.

The scenario

Same repo, different folder. You should have a running cluster from Day 0 ready to go.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/affinity-rules-violation ls

description.md, issue.yaml, fix.yaml. The issue manifest pins the pod to nodes that have a disktype=ssd label. If no node in your cluster has that label, the pod is homeless by design.

Reproduce the issue

bash
kubectl apply -f issue.yaml kubectl get pod affinity-violation-pod
plaintext
NAME READY STATUS RESTARTS AGE affinity-violation-pod 0/1 Pending 0 2m

Two minutes, five minutes, ten. The pod does not move. And unlike the insufficient-resources case, the cluster looks perfectly healthy. That is the trap. Everything is fine except the one pod that has asked for a label no node has.

Debug the hard way

describe first, always.

bash
kubectl describe pod affinity-violation-pod
plaintext
Events: Type Reason From Message ---- ------ ---- ------- Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector.

The magic words: didn't match Pod's node affinity/selector. That rules out CPU, memory, taints, and every other predicate. The scheduler is saying the nodes exist, they have room, but your pod's label requirement does not match any of them.

Now confirm what the pod actually wants:

bash
kubectl get pod affinity-violation-pod -o jsonpath='{.spec.affinity}'
plaintext
{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution": {"nodeSelectorTerms":[{"matchExpressions":[ {"key":"disktype","operator":"In","values":["ssd"]}]}]}}}

The pod demands disktype=ssd. Now the other side of the equation:

bash
kubectl get nodes --show-labels
plaintext
NAME STATUS ROLES LABELS kind-control-plane Ready control-plane kubernetes.io/hostname=kind-control-plane, kubernetes.io/os=linux,...

No disktype label anywhere. The pod is asking for something that does not exist on any node in the cluster. The scheduler will never satisfy this, no matter how long it waits.

Why this happens

requiredDuringSchedulingIgnoredDuringExecution is a mouthful, but the two halves tell you everything. required means the rule is hard: no match, no schedule. IgnoredDuringExecution means if a running pod's conditions change later, Kubernetes will not evict it. Together, they produce a rule that is strict at placement time and lazy after.

The usual cause is a copy-paste from a production manifest into a dev cluster where the nodes were never labelled. Production has disktype=ssd on every worker. Dev does not. The YAML is identical, but the environment is not. The scheduler does not care about your intent, it cares about labels.

There is no warning when you kubectl apply a pod whose affinity is impossible to satisfy. The API server accepts it cleanly. The only feedback loop is the scheduler event log, and you only see it if you go look.

The fix

Two valid paths. Label the nodes so they match, or relax the pod. For a dev cluster, relaxing is faster:

bash
kubectl apply -f fix.yaml kubectl get pod affinity-violation-fixed-pod
plaintext
NAME READY STATUS RESTARTS AGE affinity-violation-fixed-pod 1/1 Running 0 4s

The diff: the entire affinity block is gone. That is the fix. If you wanted to keep the rule for production fidelity, label your dev node instead:

bash
kubectl label node kind-control-plane disktype=ssd

Either path works. The point is that one side of the equation has to move.

The lesson

  1. A Pending pod on a cluster with free capacity is almost always an affinity, taint, or selector mismatch. Skip capacity and go straight to describe.
  2. required rules are strict and silent. The API server will accept an impossible rule and the pod will wait forever.
  3. Affinity is a two-sided contract. Always check the pod's requirement and the node's labels in the same breath.

Day 9 of 35 — tomorrow, nodeAffinity pointing at a hostname that does not exist, and the one-character typo that cost me an hour.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.