The problem
As you know, I run a self-hosted Kubernetes Homelab. Recently, I had a problem where I couldn’t reach some services running in the cluster after restarting the Kubernetes servers. The problem didn’t happen every time I restarted the servers, but whenever it did occur, it was always after a server restart. Ingress-nginx is the service that got affected specifically. I have it configured to request a specific IP address, but sometimes after the server restarts, it fails to get the IP address from MetalLB, the tool I use to manage and allocate IP addresses to the different services running in the cluster. When I first set up the cluster, I configured MetalLB with a pool of IPs it could assign to services. There were enough IP addresses for all the services and more to spare, so this wasn’t an IP address exhaustion problem. A service might grab a previously assigned IP because of the way MetalLB assigns IPs; it allocates IPs from a shared pool, and on reboot, services might start in a different order, leading to another service taking ingress-nginx’s IP.
Solution
To debug the problem, the first thing I did was to check if the cluster was healthy. All server nodes were up and running, deployments were running okay, and application and pod logs looked fine. I ran kubectl get svc
to check on the network services and realised that the ingress nginx controller’s IP address column was stuck in “pending” state. This meant that it had failed to get its reserved IP address because another service had taken the IP address it wanted. As a quick fix, I stopped the service that had taken up the ingress-nginx’s IP and the IP address was immediately assigned to ingress-nginx. This worked until the server restarted again, so I needed a permanent solution.
To resolve this IP address allocation race condition, I created IP addresses for each load balancer in the cluster instead of having the load balancer get its IPs from a shared IP pool. This way, the load balancers have reliable IP addresses that survive server restarts. Here’s how I created a static IP for ingress-nginx in MetalLB:
apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: ingress-nginx namespace: metallb-system spec: addresses: - 10.0.0.8/32 --- apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: l2-advertisement namespace: metallb-system spec: ipAddressPools: - ingress-nginx
When adding individual IPs and not an IP range to a pool like what I did above, MetalLB requires the IP to have a CIDR. 10.0.0.8/32 represents a single IP.
Next, I updated the ingress-nginx deployment to request that specific IP address:
--- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.12.1 name: ingress-nginx-controller namespace: ingress-nginx annotations: metallb.universe.tf/address-pool: ingress-nginx
Adding the metallb.universe.tf/address-pool: ingress-nginx
annotation to the service makes it request the 10.0.0.8
IP address from MetalLB
Takeaways
- Dynamic IPs can cause race conditions. Relying on dynamically assigned IP addresses can lead to service conflicts after reboots due to the non-deterministic order in which services come online.
- Static IPs increase reliability. Assigning static IPs to critical services like ingress controllers ensures predictable behavior and avoids conflicts on server or network reboot.
- MetalLB is a joy to work with. MetalLB supports both shared IP pools and dedicated IP configurations, making it easy to set up and adjust to meet my needs.
- Self-hosting is hard. Unlike managed cloud services, self-hosting requires handling networking, IP management, hardware, power issues and fault tolerance. Thinking about and solving these issues deepens my understanding of system admin in general and Kubernetes internals specifically.