Launch is in four hours. I apply the Service, run kubectl get svc, and the two most annoying characters in Kubernetes stare back at me: <pending>. The cloud-controller-manager logs look clean. The cluster is healthy. Other LoadBalancer Services in the same cluster are happily holding real external IPs. This one has been sitting on <pending> for twelve minutes and climbing. I open an AWS console tab, dig through ELB, and there is actually a brand-new ELB created for this Service, reachable, TCP listener up. But it has zero healthy targets, because the Service selector matches no pods, so the controller registered no backends. The provisioning worked fine. The selector was wrong.
The scenario
Before we debug anything, reproduce it locally so your screen matches mine.
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git
cd troubleshoot-kubernetes-like-a-pro/scenarios/loadbalancer-service-misconfig
lsYou will see issue.yaml, fix.yaml, description.md, and the helper scripts. The two YAML files are what we need. The Service in issue.yaml selects app: non-existent-app, which deliberately matches no pods.
Reproduce the issue
kubectl apply -f issue.yamlservice/loadbalancer-service-issue createdkubectl get svc loadbalancer-service-issueOn a managed cluster with a cloud-controller-manager, the output looks like this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
loadbalancer-service-issue LoadBalancer 10.96.201.44 <pending> 80:31220/TCP 45sOn a local kind or minikube cluster without an LB implementation, EXTERNAL-IP stays <pending> forever for a different reason. Both look identical on the surface, and that is half the problem. Same symptom, completely different causes.
Debug the hard way
First, the endpoints check. Always the endpoints check.
kubectl get endpoints loadbalancer-service-issueNAME ENDPOINTS AGE
loadbalancer-service-issue <none> 1m<none> means the Service selector matches zero pods. That is the real problem, and it's independent of the cloud provider entirely. Now look at events:
kubectl describe svc loadbalancer-service-issue | tail -15Type: LoadBalancer
Selector: app=non-existent-app
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 1m service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 55s service-controller Ensured load balancerOn a cloud cluster, you see EnsuredLoadBalancer. The LB exists. Kubernetes did its job. Yet EXTERNAL-IP is <pending> because on some providers the controller only publishes the IP after at least one backend target goes healthy. And with no endpoints, nothing ever goes healthy.
Check the controller manager if you have access:
kubectl -n kube-system logs -l component=cloud-controller-manager --tail=50 | grep -i loadbalancerYou'll see the ensure call succeed. The mystery is not in the logs. It is in that one <none> line from kubectl get endpoints.
Why this happens
A LoadBalancer Service has two jobs. The cloud-controller-manager provisions an external load balancer in the provider (ELB, NLB, GCP Forwarding Rule, Azure LB), and kube-proxy programs the node-level rules that forward packets from the node port to one of the backend pods. The selector decides which pods become backends. No selector matches, no endpoints. No endpoints, no healthy targets. No healthy targets, and a lot of cloud providers will not publish the external IP into .status.loadBalancer.ingress, which is what kubectl get svc shows.
The second common cause is missing cloud-provider annotations. On AWS, if you want an NLB instead of a classic ELB, you need service.beta.kubernetes.io/aws-load-balancer-type: nlb. If you want internal, you need service.beta.kubernetes.io/aws-load-balancer-scheme: internal. Missing subnet tags will also keep the LB from provisioning at all. Each cloud has its own matrix of annotations.
The third cause is cluster networking without a cloud provider. On kind, minikube, or a bare-metal cluster without MetalLB installed, LoadBalancer Services have no controller to reconcile them, and they sit on <pending> forever. Not a bug, just nothing listening for that resource.
The fix
kubectl delete -f issue.yaml
kubectl apply -f fix.yamlThe diff is one line. The selector changes from app: non-existent-app to app: my-app, matching the actual pods:
selector:
app: my-appVerify:
kubectl get svc loadbalancer-service-fixed
kubectl get endpoints loadbalancer-service-fixedEndpoints list populates in a second or two. On a cloud cluster, the EXTERNAL-IP fills in within a minute after the first target goes healthy.
The lesson
<pending>is a symptom, not a cause.kubectl get endpointstells you in one line whether you have a Kubernetes problem or a cloud problem.- Know your cloud's required annotations and subnet tags before you apply. Copy them from a working Service in the same cluster rather than from a blog post.
- On local clusters, install MetalLB if you want LoadBalancer Services to actually get an IP. Otherwise use NodePort or port-forward.
Day 24 of 35, tomorrow an Ingress returns 404 for a host that definitely exists, and the nginx logs show your request never even arrived.
