HAProxy Ingress on OPNSense for K3S

My homelab k3s cluster does not currently have a load balancer. I use one of the boxes as the entry to the cluster. This is fine if this box is running, but if it’s down or under maintenance, all ingress is dead. This is not ideal. I would attach another computer to the cluster for gateway needs, but at this point it’s a bit overkill.

My router is a 4 core Celeron N5105 OPNsense box. It’s a powerful enough machine for home routing needs. Ideally, OPNSense should only serve edge routing purposes. However, since I don’t have a spare box to put a load balancer on, I decided to use haproxy plugin and use the box as a load balancer both for Kubernetes API and services. All inter-node traffic in the cluster is going through a 2.5Gbit switch and traffic to NAS through a 10 Gbit port on a NAS in a different room through another switch.

It shouldn’t put a huge burdain on the gateway. This box is already around 4 years old and perhaps is due for a refresh (I wanted to migrate to Unifi gateway at some point). Later it can be demoted to be a load balancer for the cluster.

However, there is some complexity in the way the traffic is handled in my home network. My Unifi switch has layer 3 switching capabilities which means it offloads inter VLAN switching to hardware, which is neat when pushing single digits gigabits traffic around. I can backup to my NAS, have all 3 Kubernetes nodes network to still be at full capacity. None of this is necessary, by the way. I am sure I wouldv’e been fine with 1gbit network and therefore my router could handle that while providing firewall capabilities between my untrusted and trusted networks. However, I did this because I could and that’s a good enough reason for me.

Layer 3 switching means most of my traffic goes through the switch, and its ACL policies disallow untrusted networks to access gateway LAN addresses (all switches and APs are on it). Here’s what it looks like.


+-------------------+       +-------------------+
|  Router           |       |  2.5Gb Switch     |______.
|  IP: 10.255.253.1 |-------|  IP: 192.168.2.13 |      |
|  IP: 192.168.2.1  |       +-------------------+      |
+-------------------+      .        |                  |
                         .          |                  |
                       .            | 10.200.0.0/24    |
                     .              |                  |
+-------------------+       +-------------------+      |
|  Kubernetes Box 1 |       |  Kubernetes Box 2 |      |
|  IP: 10.200.0.x   |       |  IP: 10.200.0.x   |      |
+-------------------+       +-------------------+      |
                                    |                  |
                                    |                  |
+-------------------+       +-------------------+      |
|  Kubernetes Box 3 |       |  Unifi Enterprise |      |
|  IP: 10.200.0.x   |-------|  Switch           |------+
+-------------------+       |  IP: 10.255.253.2 |
                            |  IP: 192.168.2.10 |
                            +-------------------+
                                    |
                                    | 10Gbit
                                    |
                            +-------------------+
                            |  NAS              |
                            |  IP: 10.x.x.x     |
                            +-------------------+

VLANs:

  • VLAN 4040
  • VLAN Trusted
  • VLAN Kubernetes
  • VLAN Untrusted (No traffic to Trusted/K8s)

Since my gateway is now a load balancer it should be accessable from untrusted networks on ports 80 and 443 (my TV is on an untrusted network).

No problem I thought, so I set up ACL to allow 80 and 443 ports to be accessable on the gateway IP. However, Unifi console complained that I need to check the rules as they’re incomplete, without any explanation as what is missing. Digging through network requests in the developer tools I found that I reached ACL limit of rules. I think it was 12. This seems rather restrictive. The user experience is suboptimal, it had me go through API calls to see what was wrong, I am not sure why the devs did not present the error message to the user - it’s baffling.

I cleaned up some rules, then added a single host endpoint on my LAN accessable on ports 443 and 80. I saved the settings and the internet dropped off. Connection resumed in a minute, the rules were saved, however, my untrusted networks could access trusted VLANs. This took me a few hours of troubleshooting, which included restarting the switch, to no avail. As soon as I add a single host with Allow on those two ports, connectivity drops out and the ACLs stop working. I gave up at that point, I blocked VLAN access to untrusted networks entirely, and instead attached load balancer to the inter-VLAN ip address which unifi configures on vlan 4040.

I am not a networking expert but so far ACL implementation and the experience of using it is lacking, to say the least. Here’s how traffic flows internally to a service in the cluster:


+-------------------+
| Untrusted Device  |
| VLAN: Untrusted   |
| Connects to:      |
| svc-a.internal    |
+-------------------+
          |
          v
+-------------------+
| Router            |
| IP: 10.255.253.1  |
| HAProxy 443       |
| (Proxy Mode)      |
+-------------------+
          |
          v
+-------------------+  +-------------------+  +-------------------+
| Kubernetes Box 1  |  | Kubernetes Box 2  |  | Kubernetes Box 3  |
| IP: 10.200.0.x    |  | IP: 10.200.0.x    |  | IP: 10.200.0.x    |
| Service-A         |  |                   |  |                   |
+-------------------+  +-------------------+  +-------------------+

Setting up opnsense and haproxy has been pretty smooth and now my API and services traffic is load balanced on my router. However, setting up ACLs on Unifi equipment has been a terrible experience. It’s not the first time that my connectivity went dead with no clarity as to why and only reset or revert seems to fix it. In this case, though, there clearly was a bug in the ACL implementation and removing the offending rules seems to have resolved it the issue.

Now that I have a load balancer in front of the cluster, I need to configure external access to it. My current implementation is a remnant of a single server set-up - rules are defined Cloudflare tunnel service in NixOS.


  services.cloudflared = {
    enable = true;
    tunnels = {
      "<tunnel id>" = {
        credentialsFile = "/run/agenix/cloudflared.json";
        ingress = {
          "svc-a.example.com" = "http://svc-a.internal";
          # ...
        };
      };
    };
  }

This worked well for a single node server, but with load balancing setup I need to ensure my haproxy is the only entrypoint to the cluster. I can run dyndns to get a stable DNS record for my home IP, or deploy cloudflared with network policies in place to limit cloudlfared exposure only to those services that it provides external access to.

At this point I run so many of my services through cloudflare that network limiting its access to certain services is low on my priorities list. I will trust that cloudflare will only tunnel those services that it’s configured to and do nothing else.

Deploying cloudfare tunnel was pretty trivial. I used their official guide and it worked like a charm. The only weak point is I don’t manage my domain through cloudflare. So switching service DNS record is a manual task for me. Not a huge problem seeing that I do it rather infrequently.

This is a good enough state for my homelab ingress needs. I am not a huge fan of tunnel backdoors that poke holes in the cluster to open access to certain services, but it’s hard to beat the security / convenience trade off.