Hybrid Talos Kubespan cluster - control plane nodes terraform

Phase 0 - prerequisites. Deploying Hetzner control plane instances

In this phase I will deploy Hetzner control plane nodes via terraform.

I created a terraform module that allows customizing instances types and other aspects of the control plane nodes. This is a simple terraform module that relies on hetzner terraform provider.

Here are the resources I will need to provision control plane stack:

  1. hcloud_network
  2. hcloud_network_subnet
  3. hcloud_firewall
  4. hcloud_ssh_key
  5. hcloud_server

Terraform tfvars file

For easier configuration of the module, I created tfvars file with the following parameters:

root_domain = ""
# Kubernetes API domain name
kubernetes_api_domain = "" # k8s api dns prefix

# Hetzner Cloud configuration for Talos
hcloud_token              = ""
hcloud_region             = "fsn1"
hcloud_control_plane_type = "cx32"

# Talos configuration
cluster_name        = "homelab-talos"
talos_version       = "v1.10.1"
control_plane_count = 1
worker_count        = 0

# Network configuration
network_cidr        = "10.0.0.0/8"
servers_subnet_cidr = "10.0.1.0/24"

My API DNS entry is <kubespan_api_domain>.<root_domain>. hcloud_token is required to communicate with hcloud API via the provider.

Hcloud network and subnet

I setup /8 CIDR network with 10.0.1.0/24 for my control plane subnet. This is configured via terraform.tfvars file:

network_cidr        = "10.0.0.0/8"
servers_subnet_cidr = "10.0.1.0/24"

Hcloud firewall

I need to ensure the firewall rules will allow KubeSpan to set-up node communication. Here’s what it looks like:

resource "hcloud_firewall" "cluster_firewall" {
  name = "${var.cluster_name}-firewall"

  # Allow internal traffic
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "any"
    source_ips = [var.network_cidr]
  }

  rule {
    direction  = "in"
    protocol   = "udp"
    port       = "any"
    source_ips = [var.network_cidr]
  }

  # Allow SSH from anywhere
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "22"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # Allow Kubernetes API
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "6443"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # Allow Kubespan
  rule {
    direction  = "in"
    protocol   = "udp"
    port       = "51820"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # Allow Talos API
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "50000"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # Allow Talos API alternate port
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "50001"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # ICMP
  rule {
    direction  = "in"
    protocol   = "icmp"
    source_ips = ["0.0.0.0/0", "::/0"]
  }
}

Note that this allows 22 from anywhere. Since I only need ssh for initial bootstrapping. Talos machines do not have SSH process running, so having 22 open poses no risk once control plane nodes are provisioned. I open ICMP, UDP port 51820 for KubeSpan, 50000/50001 for Talos process communication and 6443 for Kubernetes API access. In the long run, I’d like to setup tailscale authentication for Kubernetes API, which is possible via Tailscale Kubernetes operator.

Hcloud server and ssh key

# SSH key for the servers
resource "hcloud_ssh_key" "default" {
  name = "${var.cluster_name}-key"
  # TODO: make this configurable
  public_key = file("~/.ssh/id_ed25519.pub")
}

# Control plane nodes
resource "hcloud_server" "control_plane" {
  count            = var.control_plane_count
  name             = "${var.cluster_name}-control-plane-${count.index + 1}"
  server_type      = var.hcloud_control_plane_type
  # Start with a basic image - we'll install Talos via rescue mode
  image            = "debian-12"
  location         = var.hcloud_region
  ssh_keys         = [hcloud_ssh_key.default.id]
  firewall_ids     = [hcloud_firewall.cluster_firewall.id]
  delete_protection = true
  rebuild_protection = true

  network {
    network_id = hcloud_network.network.id
    ip         = cidrhost(var.servers_subnet_cidr, count.index + 1)
  }

  # We're using local provisioners in talos.tf to configure the nodes
  # No need for cloud-init user_data

  depends_on = [hcloud_network_subnet.cloud_subnet]
}

I’ll create debian 12 box. It’s only required for initial Talos bootstrapping.

Running plan apply.

I prefer to use GNUmakefile for most common used commands. This way, I can run all commands from the root folder of the project. For tofu-plan, I run make tf-plan For tofu apply, I run:

make tf-apply

with auto-approve make tf-apply AUTO_APPROVE=true

Check what tf-apply does at tf-apply

Phase 1 - installing Talos.

In the next section, I’ll describe how to install Talos OS on the control plane nodes.

Code.

The code used to deploy the cluster is available via Github - sashkachan/talos-kubespan-bootstrap. I will use this code for the walkthrough of all phases and configuration required to make it succeed.