koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 27 / 35

NetworkPolicy Default-Deny Broke My Whole Namespace. Here Is the Fix

One default-deny egress policy, one black-hole namespace, one very long pager night.

KV
Koti Vellanki15 Apr 20264 min read
kubernetesdebuggingnetworking

Somebody wrote a "default deny" NetworkPolicy during the security review last week. Looked reasonable on paper, applied cleanly, everybody signed off. 2AM tonight, a fresh deployment rolls out into that namespace and every pod turns into a black hole. Running, Ready, but unable to reach the database, the metrics endpoint, even kube-dns. Liveness probes start failing because the kubelet itself tries to HTTP-GET the pod from the node and the kubelet's source IP is not in any allowlist. The pods start restarting. The restarts don't help because nothing changed on the pod side. The blast radius is the entire namespace and I'm the one holding the pager.

The scenario

Reproduce it in your own cluster. You need a CNI that actually enforces NetworkPolicy for this to mean anything, Calico, Cilium, or Antrea. Plain flannel will accept the policy and silently ignore it.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/network-connectivity-issues ls

You should see issue.yaml, fix.yaml, description.md, network_issue.sh. The issue file creates a pod plus a NetworkPolicy that denies all egress for pods with the label app: network-test.

Reproduce the issue

bash
kubectl apply -f issue.yaml
plaintext
pod/network-connectivity-issue-pod created networkpolicy.networking.k8s.io/deny-egress-network-test created

The pod tries to wget http://google.com and fails:

bash
kubectl logs network-connectivity-issue-pod
plaintext
blocked

The pod is Running, the container is happy, the wget timed out, the log says blocked, and nothing in the pod events tells you a NetworkPolicy is the cause. NetworkPolicy drops are silent. The kernel silently drops the packet at the CNI layer and the pod sees a connect timeout like it is an upstream problem.

Debug the hard way

First the usual checks, because you will run them anyway:

bash
kubectl get pod network-connectivity-issue-pod -o wide
plaintext
NAME READY STATUS RESTARTS AGE IP NODE network-connectivity-issue-pod 1/1 Running 0 90s 10.244.1.17 worker-1

Pod is fine. Then DNS and direct reachability from inside the pod:

bash
kubectl exec network-connectivity-issue-pod -- wget -qO- --timeout=3 http://kubernetes.default || echo fail
plaintext
fail

Even the API server Service is unreachable. That is the fingerprint of a default-deny egress policy. Now the command that actually matters, list every NetworkPolicy that selects this pod:

bash
kubectl get networkpolicy -A
plaintext
NAMESPACE NAME POD-SELECTOR AGE default deny-egress-network-test app=network-test 2m
bash
kubectl describe networkpolicy deny-egress-network-test
plaintext
Name: deny-egress-network-test Namespace: default PodSelector: app=network-test Policy Types: Egress Egress: <none>

Egress: <none> with Policy Types: Egress means "all egress denied for pods matching app=network-test". That is your answer, and no events, no logs, no pod conditions would have told you that. You had to go look for the policy yourself.

Why this happens

NetworkPolicy is additive in an interesting way: if no policy selects a pod, all traffic is allowed. If any policy selects a pod, then only the traffic explicitly allowed by all policies combined is permitted for the direction listed in policyTypes. So the moment you apply an empty egress policy that selects a pod, everything egress is denied unless you also add to rules. A lot of teams write this policy thinking "default deny" means "start from deny and then we layer allows on top", which is correct in intent but wrong in consequences, because they forget the allow-list layer.

The second trap is the kubelet health probe. The kubelet sends HTTP probes to the pod from the node's IP, which is not the pod network. An ingress policy that only allows traffic from podSelector in the same namespace will silently block the kubelet's probe, marking the pod Unhealthy and restarting it in a loop. The fix is an ingress rule allowing traffic from the node CIDR, or using exec probes instead of HTTP probes.

The third trap is DNS. A default-deny egress policy blocks traffic to kube-dns on port 53, which means every application call that uses a hostname fails before it even starts. Your allow-list needs an explicit rule allowing UDP and TCP to port 53 to the kube-system namespace, or nothing resolves.

The fix

bash
kubectl delete -f issue.yaml kubectl apply -f fix.yaml

The scenario fix removes the NetworkPolicy entirely. In a real cluster you do not want to delete the security policy, you want to fix it. The correct pattern is a default-deny policy paired with explicit allows for DNS, for kubelet probes, and for the actual traffic the app needs:

yaml
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns-and-app spec: podSelector: matchLabels: app: network-test policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - protocol: UDP port: 53 - protocol: TCP port: 53 - to: - podSelector: matchLabels: app: api ports: - protocol: TCP port: 8080

Verify with a wget from the pod and with kubectl describe on the policy to make sure the rules render as you expect.

The lesson

  1. Default-deny egress without an explicit kube-dns allow is self-sabotage. Every default-deny policy needs DNS exceptions on day one.
  2. NetworkPolicy drops are silent. No events, no logs on the pod. The only diagnosis is listing the policies that select the pod.
  3. You must have a CNI that enforces NetworkPolicy. Calico, Cilium, Antrea. Plain flannel accepts the policy and ignores it, which is worse than not having one at all because it gives you false confidence.

Day 27 of 35, tomorrow the cluster talks to itself perfectly but cannot reach the payment processor, and nothing in the cluster looks broken.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.