Hybrid Talos cluster with KubeSpan. Phase 2 - Generating config files
Hybrid Talos cluster with KubeSpan
In this blog series I explore how I deploy hybrid Talos cluster with KubeSpan.
Previous articles:
Introduction post on hybrid cluster deployment
Phase 0 - provisioning control plane nodes with Terraform
Phase 1 - preparing environment
Phase 2 - generating configuration
In this phase, I generate configuration files for Talos. Generating configuration files may be tricky as part of the configuration is Talos secrets. When I re-generate configuration files (for example, when adding a patch, or a new component to install as part of bootstrapping process) I want to ensure secrets file remains stable and is not re-generated. This way, I can apply new configuration to the existing cluster.
Steps for this phase are scripted in 3_generate_config.sh, and can be executed via talos-generate-configs make job.
Handling secrets
First, I am checking if secrets file already exists, or if explicit flag to re-generate secrets has been supplied.
# Check if we should regenerate secrets
if [ "${REGENERATE_SECRETS}" == "yes" ]; then
echo "Regenerating Talos secrets..."
talosctl gen secrets --output-file "$MANIFESTS_DIR/secrets.yaml" --force
elif [ ! -f "$MANIFESTS_DIR/secrets.yaml" ]; then
echo "No existing secrets found. Generating new secrets..."
talosctl gen secrets --output-file "$MANIFESTS_DIR/secrets.yaml" --force
else
echo "Using existing secrets from $MANIFESTS_DIR/secrets.yaml"
fi
Next, I am passing secrets file to config generator.
talosctl gen config include --with-secrets flag. I pass ./manifests/secrets.yaml path to the command and existing secrets be used in control plane configuration.
Including all patches and bootstrap manifests, this is what generate config command looks like:
talosctl gen config \
"$CLUSTER_NAME" \
"https://$ENDPOINT:6443" \
--force \
--with-docs=false \
--with-kubespan=true \
--with-examples=false \
--additional-sans="$ADDITIONAL_SANS" \
--config-patch-control-plane @patches/cp-patch-kube-prism.yml \
--config-patch-control-plane @patches/cp-patch-network.yml \
--config-patch-control-plane @patches/cp-patch-user-ns.yml \
--config-patch @patches/cf-patch-cni.yml \
--config-patch @patches/cf-patch-cilium.yml \
--config-patch @patches/cf-patch-argocd.yml \
--config-patch @patches/machine-patch-kubespan-filters.yml \
--with-secrets="$MANIFESTS_DIR/secrets.yaml" \
--output-dir "$GENERATED_DIR"
$ADDITIONAL_SANS includes all nodes IP addresses as well as cluster endpoint DNS. These are fetched via cluster.env file. I include 3 control plane patches.
KubePrism enables HA in-cluster API endpoint. It’s enabled by default as of Talos 1.16. It’s redundant, but is still there. Network patch configures pod and service subnets:
cluster:
network:
dnsDomain: cluster.local
podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/12
User namespaces patch adds user-ns support. In short, it enables the feature where users in containers can be mapped to different users in the host. Therefore, it’s not required to have root user in a container to run commands that require privileged access. I use that to build images via Github runners without running privileged containers.
cluster:
apiServer:
extraArgs:
feature-gates: UserNamespacesSupport=true,UserNamespacesPodSecurityStandards=true
machine:
sysctls:
user.max_user_namespaces: "11255"
kubelet:
extraConfig:
featureGates:
UserNamespacesSupport: true
UserNamespacesPodSecurityStandards: true
CNI
To deploy CNI I need to make some preparations.
First, I disable CNI from cluster config, as per Config.cluster.network.cni. I will install Calico CNI in a separate patch via Helm. This way I can remove kube-proxy with Calico internal component and configure additional parameters for homelab LoadBalancer configuration.
cluster:
network:
cni:
name: none
proxy:
disabled: true
Now, I am adding Cilium CNI patch.
It’s using inlineManifests that allows running Jobs during bootstrapping process.
Check cf-patch-cilium.yml on Github. It’s running Kubernetes Job with admin permissions to install Cilium components. Additionally, I add CiliumLoadBalancerIPPool and Gateway to enable network load balancer in my homelab.
Command to install Cilium (after CRDs have been applied) is the following:
cilium install --set ipam.mode=kubernetes --set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent={CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID} \
--set securityContext.capabilities.cleanCiliumState={NET_ADMIN,SYS_ADMIN,SYS_RESOURCE} \
--set cgroup.autoMount.enabled=false --set cgroup.hostRoot=/sys/fs/cgroup \
--set k8sServiceHost=localhost \
--set k8sServicePort=7445 \
--set gatewayAPI.enabled=true \
--set gatewayAPI.hostNetwork.enabled=false \
--set nodeIPAM.enabled=false \
--set l2announcements.enabled=true \
--set externalIPs.enabled=true \
--set devices="eth+ ens+ enp+"
I am enabling l2announcements which in combination with CiliumLoadBalancerIPPool enables LoadBalancer service type. This allows for internal homelab traffic to reach Kubernetes services via a fixed LoadBalancer IP pool. Nodes ethernet interfaces will announce IP addresses of LoadBalancer services via ARP. It only works locally, but is good enough for homelab routing. I am using OPNSense so BGP routing is possible, but I opted for the simpler approach. For external traffic, I am using Cloudflare tunnels instead.
ArgoCD
I prefer to install ArgoCD during bootstrapping phase. Therefore, patch is included via cf-patch-argocd.yml.
KubeSpan filters
It’s important to ensure KubeSpan advertises on correct IP addresses. I include a patch to disable advertising on local address (for my network it’s 10.0.1.0/24). Otherwise, KubeSpan may latch on to local IP and prevent worker nodes from communicating with control plane nodes.
machine:
network:
kubespan:
enabled: true
advertiseKubernetesNetworks: false # pod-to-pod traffic goes via CNI
allowDownPeerBypass: false # ensure traffic only goes via KubeSpan
mtu: 1420
filters:
endpoints:
- 0.0.0.0/0
- "!10.0.1.0/24"
- "!::/0"
Generated config
Lastly, previously generated secrets patch is added as well.
Generated configuration can be inspected in ./generated folder. It will be applied in the next phase.
Code
The code used to deploy the cluster is available via Github - sashkachan/talos-kubespan-bootstrap. I will use this code for the walkthrough of all phases and configuration required to make it succeed.