Unprivileged containers

When using instanceType: container, CAPN will launch an LXC container for each cluster node. In order for Kubernetes and the container runtime to work, CAPN launches privileged containers by default.

However, privileged containers can pose security risks, especially in multi-tenant deployments. In such scenarios, if an adversary workload takes control of the kubelet, it can use the privileged capabilities to escape the container boundaries and affect workloads of other tenants or even fully take over the hypervisor.

In order to address these security risks, it is possible to use unprivileged containers instead.

Using unprivileged containers

To use unprivileged containers, use the default cluster template and set PRIVILEGED=false.

Unprivileged containers require extra configuration on the container runtime. This configuration is available in the kubeadm images starting from version v1.32.4.

Running Kubernetes in unprivileged containers

In order for Kubernetes to work inside an unprivileged containers, configuration of containerd, kubelet and kube-proxy is adjusted, in accordance with the upstream project documentation.

In particular, the following configuration adjustments are performed:

kubelet

  • add feature gate KubeletInUserNamespace: true

When using the default cluster template, these are applied on the nodes through a KubeletConfiguration patch.

NOTE: Kubernetes documentation also recommends using cgroupDriver: cgroupfs, but Incus and Canonical LXD both work correctly with the systemd cgroup driver. Further, Kubelet 1.32+ with containerd 2.0+ can query which cgroup driver is used through the CRI API, so no static configuration is required.

containerd

  • set disable_apparmor = true
  • set restrict_oom_score_adj = true
  • set disable_hugetlb_controller = true

NOTE: Kubernetes documentation also recommends setting SystemdCgroup = false, but Incus and Canonical LXD both work correctly with the systemd cgroup driver.

When using the default images, the containerd service will automatically detect that the container is running in unprivileged mode, and set those options before starting. See systemctl status containerd for details.

Support in pre-built kubeadm images

Unprivileged containers are supported with the pre-built kubeadm images starting from version v1.32.4.

Limitations in unprivileged containers

Known limitations apply when using unprivileged containers, e.g. consuming NFS volumes. See Caveats and Caveats and Future work for more details.

Similar limitations might apply for the CNI of the cluster. kube-flannel with the vxlan backend is known to work.

Testing

The above have been tested with Incus 6.10+ on Kernel 6.8 or newer.