Hybrid Talos cluster with KubeSpan. Phase 2 - Generating config files

Hybrid Talos cluster with KubeSpan

In this blog series I explore how I deploy hybrid Talos cluster with KubeSpan.

Introduction post on hybrid cluster deployment

Phase 0 - provisioning control plane nodes with Terraform

Phase 2 - generating configuration

In this phase, I generate configuration files for Talos. Generating configuration files may be tricky as part of the configuration is Talos secrets. When I re-generate configuration files (for example, when adding a patch, or a new component to install as part of bootstrapping process) I want to ensure secrets file remains stable and is not re-generated. This way, I can apply new configuration to the existing cluster.

Steps for this phase are scripted in 3_generate_config.sh, and can be executed via talos-generate-configs make job.

Handling secrets

First, I am checking if secrets file already exists, or if explicit flag to re-generate secrets has been supplied.

# Check if we should regenerate secrets
if [ "${REGENERATE_SECRETS}" == "yes" ]; then
  echo "Regenerating Talos secrets..."
  talosctl gen secrets --output-file "$MANIFESTS_DIR/secrets.yaml" --force
elif [ ! -f "$MANIFESTS_DIR/secrets.yaml" ]; then
  echo "No existing secrets found. Generating new secrets..."
  talosctl gen secrets --output-file "$MANIFESTS_DIR/secrets.yaml" --force
else
  echo "Using existing secrets from $MANIFESTS_DIR/secrets.yaml"
fi

Next, I am passing secrets file to config generator. talosctl gen config include --with-secrets flag. I pass ./manifests/secrets.yaml path to the command and existing secrets be used in control plane configuration.

Including all patches and bootstrap manifests, this is what generate config command looks like:

talosctl gen config \
  "$CLUSTER_NAME" \
  "https://$ENDPOINT:6443" \
  --force \
  --with-docs=false \
  --with-kubespan=true \
  --with-examples=false \
  --additional-sans="$ADDITIONAL_SANS" \
  --config-patch-control-plane @patches/cp-patch-kube-prism.yml \
  --config-patch-control-plane @patches/cp-patch-network.yml \
  --config-patch-control-plane @patches/cp-patch-user-ns.yml \
  --config-patch @patches/cf-patch-cni.yml \
  --config-patch @patches/cf-patch-cilium.yml \
  --config-patch @patches/cf-patch-argocd.yml \
  --config-patch @patches/machine-patch-kubespan-filters.yml \
  --with-secrets="$MANIFESTS_DIR/secrets.yaml" \
  --output-dir "$GENERATED_DIR"

$ADDITIONAL_SANS includes all nodes IP addresses as well as cluster endpoint DNS. These are fetched via cluster.env file. I include 3 control plane patches.

KubePrism enables HA in-cluster API endpoint. It’s enabled by default as of Talos 1.16. It’s redundant, but is still there. Network patch configures pod and service subnets:

cluster:
  network:
    dnsDomain: cluster.local
    podSubnets:
      - 10.244.0.0/16
    serviceSubnets:
      - 10.96.0.0/12

User namespaces patch adds user-ns support. In short, it enables the feature where users in containers can be mapped to different users in the host. Therefore, it’s not required to have root user in a container to run commands that require privileged access. I use that to build images via Github runners without running privileged containers.

cluster:
  apiServer:
    extraArgs:
      feature-gates: UserNamespacesSupport=true,UserNamespacesPodSecurityStandards=true
machine:
  sysctls:
    user.max_user_namespaces: "11255"
  kubelet:
    extraConfig:
      featureGates:
        UserNamespacesSupport: true
        UserNamespacesPodSecurityStandards: true

CNI

To deploy CNI I need to make some preparations.

First, I disable CNI from cluster config, as per Config.cluster.network.cni. I will install Calico CNI in a separate patch via Helm. This way I can remove kube-proxy with Calico internal component and configure additional parameters for homelab LoadBalancer configuration.

cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true

Now, I am adding Cilium CNI patch. It’s using inlineManifests that allows running Jobs during bootstrapping process. Check cf-patch-cilium.yml on Github. It’s running Kubernetes Job with admin permissions to install Cilium components. Additionally, I add CiliumLoadBalancerIPPool and Gateway to enable network load balancer in my homelab.

Command to install Cilium (after CRDs have been applied) is the following:

cilium install --set ipam.mode=kubernetes --set kubeProxyReplacement=true \
    --set securityContext.capabilities.ciliumAgent={CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID} \
    --set securityContext.capabilities.cleanCiliumState={NET_ADMIN,SYS_ADMIN,SYS_RESOURCE} \
    --set cgroup.autoMount.enabled=false --set cgroup.hostRoot=/sys/fs/cgroup \
    --set k8sServiceHost=localhost \
    --set k8sServicePort=7445 \
    --set gatewayAPI.enabled=true \
    --set gatewayAPI.hostNetwork.enabled=false \
    --set nodeIPAM.enabled=false \
    --set l2announcements.enabled=true \
    --set externalIPs.enabled=true \
    --set devices="eth+ ens+ enp+"

I am enabling l2announcements which in combination with CiliumLoadBalancerIPPool enables LoadBalancer service type. This allows for internal homelab traffic to reach Kubernetes services via a fixed LoadBalancer IP pool. Nodes ethernet interfaces will announce IP addresses of LoadBalancer services via ARP. It only works locally, but is good enough for homelab routing. I am using OPNSense so BGP routing is possible, but I opted for the simpler approach. For external traffic, I am using Cloudflare tunnels instead.

ArgoCD

I prefer to install ArgoCD during bootstrapping phase. Therefore, patch is included via cf-patch-argocd.yml.

KubeSpan filters

It’s important to ensure KubeSpan advertises on correct IP addresses. I include a patch to disable advertising on local address (for my network it’s 10.0.1.0/24). Otherwise, KubeSpan may latch on to local IP and prevent worker nodes from communicating with control plane nodes.

machine:
  network:
    kubespan:
      enabled: true
      advertiseKubernetesNetworks: false # pod-to-pod traffic goes via CNI
      allowDownPeerBypass: false # ensure traffic only goes via KubeSpan
      mtu: 1420
      filters:
        endpoints:
          - 0.0.0.0/0
          - "!10.0.1.0/24"
          - "!::/0"

Generated config

Lastly, previously generated secrets patch is added as well. Generated configuration can be inspected in ./generated folder. It will be applied in the next phase.

Code

The code used to deploy the cluster is available via Github - sashkachan/talos-kubespan-bootstrap. I will use this code for the walkthrough of all phases and configuration required to make it succeed.