Somebody wrote a "default deny" NetworkPolicy during the security review last week. Looked reasonable on paper, applied cleanly, everybody signed off. 2AM tonight, a fresh deployment rolls out into that namespace and every pod turns into a black hole. Running, Ready, but unable to reach the database, the metrics endpoint, even kube-dns. Liveness probes start failing because the kubelet itself tries to HTTP-GET the pod from the node and the kubelet's source IP is not in any allowlist. The pods start restarting. The restarts don't help because nothing changed on the pod side. The blast radius is the entire namespace and I'm the one holding the pager.
The scenario
Pod-A issues the request. The backend NetworkPolicy drops it.
A default-deny ingress NetworkPolicy in the backend namespace makes pod-b isolated. Traffic from frontend is not on the allow list — the CNI drops the SYN silently. Pod-a sees a timeout; pod-b sees nothing.
pod-a issues the request
Running curl http://pod-b:8080 from inside pod-a. DNS resolves cleanly. The SYN leaves the pod. Up to here, everything is normal.
The backend NetworkPolicy is default-deny
Any NetworkPolicy that selects pod-b and lists Ingress in policyTypes makes pod-b isolated. Only traffic from kube-system is explicitly allowed. Traffic from frontend is not on the list — the CNI drops the packet silently.
pod-a sees a timeout, pod-b sees nothing
No RST, no ICMP unreachable. The kernel on pod-b never receives the SYN. Run kubectl get networkpolicy -n backend to surface the policy. Then check both ingress.from rules and the source namespace labels.
Reproduce it in your own cluster. You need a CNI that actually enforces NetworkPolicy for this to mean anything, Calico, Cilium, or Antrea. Plain flannel will accept the policy and silently ignore it.
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/network-connectivity-issues
lsYou should see issue.yaml, fix.yaml, description.md, network_issue.sh. The issue file creates a pod plus a NetworkPolicy that denies all egress for pods with the label app: network-test.
Reproduce the issue
kubectl apply -f issue.yamlpod/network-connectivity-issue-pod created
networkpolicy.networking.k8s.io/deny-egress-network-test createdThe pod tries to wget http://google.com and fails:
kubectl logs network-connectivity-issue-podblockedThe pod is Running, the container is happy, the wget timed out, the log says blocked, and nothing in the pod events tells you a NetworkPolicy is the cause. NetworkPolicy drops are silent. The kernel silently drops the packet at the CNI layer and the pod sees a connect timeout like it is an upstream problem.
Debug the hard way
First the usual checks, because you will run them anyway:
kubectl get pod network-connectivity-issue-pod -o wideNAME READY STATUS RESTARTS AGE IP NODE
network-connectivity-issue-pod 1/1 Running 0 90s 10.244.1.17 worker-1Pod is fine. Then DNS and direct reachability from inside the pod:
kubectl exec network-connectivity-issue-pod -- wget -qO- --timeout=3 http://kubernetes.default || echo failfailEven the API server Service is unreachable. That is the fingerprint of a default-deny egress policy. Now the command that actually matters, list every NetworkPolicy that selects this pod:
kubectl get networkpolicy -ANAMESPACE NAME POD-SELECTOR AGE
default deny-egress-network-test app=network-test 2mkubectl describe networkpolicy deny-egress-network-testName: deny-egress-network-test
Namespace: default
PodSelector: app=network-test
Policy Types: Egress
Egress:
<none>Egress: <none> with Policy Types: Egress means "all egress denied for pods matching app=network-test". That is your answer, and no events, no logs, no pod conditions would have told you that. You had to go look for the policy yourself.
Why this happens
NetworkPolicy is additive in an interesting way: if no policy selects a pod, all traffic is allowed. If any policy selects a pod, then only the traffic explicitly allowed by all policies combined is permitted for the direction listed in policyTypes. So the moment you apply an empty egress policy that selects a pod, everything egress is denied unless you also add to rules. A lot of teams write this policy thinking "default deny" means "start from deny and then we layer allows on top", which is correct in intent but wrong in consequences, because they forget the allow-list layer.
The second trap is the kubelet health probe. The kubelet sends HTTP probes to the pod from the node's IP, which is not the pod network. An ingress policy that only allows traffic from podSelector in the same namespace will silently block the kubelet's probe, marking the pod Unhealthy and restarting it in a loop. The fix is an ingress rule allowing traffic from the node CIDR, or using exec probes instead of HTTP probes.
The third trap is DNS. A default-deny egress policy blocks traffic to kube-dns on port 53, which means every application call that uses a hostname fails before it even starts. Your allow-list needs an explicit rule allowing UDP and TCP to port 53 to the kube-system namespace, or nothing resolves.
The fix
One allow rule. Traffic from frontend now reaches backend.
Adding a namespaceSelector rule that matches the frontend namespace unblocks pod-a → pod-b. NetworkPolicy is allowlist-based: one explicit rule is all it takes. The deny-all baseline stays in place for every other source.
pod-a issues the same request
Nothing changed on the source side. pod-a still runs curl http://pod-b:8080. The fix was entirely in the destination namespace policy.
One new allow rule — namespaceSelector
Adding from.namespaceSelector.matchLabels.name: frontend to the ingress rules in the backend NetworkPolicy is all that is required. First label the namespace: kubectl label namespace frontend name=frontend. The deny-all baseline stays in place for every other source.
Connection established — 200 OK
The CNI now matches the ingress rule and forwards the SYN to pod-b. Verify with kubectl exec -n frontend pod-a -- curl -s -o /dev/null -w "%{http_code}" http://pod-b.backend:8080.
kubectl delete -f issue.yaml
kubectl apply -f fix.yamlThe scenario fix removes the NetworkPolicy entirely. In a real cluster you do not want to delete the security policy, you want to fix it. The correct pattern is a default-deny policy paired with explicit allows for DNS, for kubelet probes, and for the actual traffic the app needs:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-and-app
spec:
podSelector:
matchLabels:
app: network-test
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
- to:
- podSelector:
matchLabels:
app: api
ports:
- protocol: TCP
port: 8080Verify with a wget from the pod and with kubectl describe on the policy to make sure the rules render as you expect.
Day 27 of 35, tomorrow the cluster talks to itself perfectly but cannot reach the payment processor, and nothing in the cluster looks broken.
