Proxmox VE + full Kubernetes (kubeadm) step-by-step
Production-ready guide: Proxmox VE + full Kubernetes (kubeadm) step-by-step
A practical, end-to-end walkthrough to deploy a hardened Proxmox VE environment and run a production-grade Kubernetes control plane and worker nodes (kubeadm). Includes HA control-plane, etcd considerations, load-balancing, networking (Calico), storage (Longhorn + option for Ceph), security hardening, backups, and monitoring.
This blog assumes you have basic Linux + virtualization familiarity and a small cluster of physical servers for Proxmox (3+ nodes recommended for HA). Where choices exist I explain trade-offs and give concrete commands and config snippets you can copy.
Outline
-
Goals & prerequisites
-
Logical architecture (ASCII diagram)
-
Prepare Proxmox VE hosts (install & base hardening)
-
Proxmox networking and storage planning
-
Create cloud-init VM template for Kubernetes nodes
-
Deploy VMs for Kubernetes control plane and workers
-
Provision HA load balancer for kube-api (HAProxy + keepalived)
-
Install Kubernetes control plane (kubeadm) with HA (stacked vs external etcd)
-
Join worker nodes & configure CNI (Calico)
-
Cluster storage (Longhorn) and persistent volumes
-
Ingress, TLS (cert-manager + Traefik/NGINX), and external DNS
-
Security hardening (Proxmox + Kubernetes best practices)
-
Backups, monitoring, logging, and upgrade strategy
-
Checklist & references (commands and file snippets)
1) Goals & prerequisites
Goals
-
Production-grade Kubernetes using
kubeadm(not a lightweight distro). -
HA control plane (3 masters), HA kube-apiserver fronted by virtual IP/load-balancer.
-
Secure Proxmox host and VMs, segregated management network.
-
Enterprise features: RBAC, NetworkPolicies, TLS, monitoring, PV storage.
Prerequisites
-
3 (or more) physical servers for Proxmox VE (64-bit CPUs with virtualization support, 32-64 GB+ RAM each recommended).
-
Proxmox VE installed on each server (Debian-based).
-
A management network (private VLAN) for Proxmox/corosync and node management.
-
DNS entries for hostnames and a DNS or at least /etc/hosts control for cluster components.
-
SSH access to Proxmox root or an admin user with sudo.
-
Basic packages:
curl,jq,ssh,ufw(optional),iptables/nftfamiliarity. -
Optional but recommended: Proxmox Backup Server (PBS) for VM backups.
2) Logical architecture (simple ASCII)
3) Prepare Proxmox hosts — install & base hardening
Perform on each physical server.
3.1 Install/Update Proxmox
Follow Proxmox install media; after install, on each node:
3.2 Secure SSH & root access
Edit /etc/ssh/sshd_config:
Add admin user, add your SSH key, and grant sudo:
Restart SSH: systemctl restart sshd
3.3 Enable and configure Proxmox firewall
In Proxmox GUI: Datacenter → Firewall → enable. Also enable per-node firewall. Start with conservative rules: allow management subnet only, deny other incoming.
Example iptables rule (if using host-level):
Use the GUI for finer-grained VM-level rules.
3.4 Two-factor authentication
Enable TOTP (Datacenter → Permissions → Two-Factor). Require 2FA for admin accounts.
3.5 Monitoring & Auditing
Configure syslog forwarding (Graylog/ELK/SIEM) from Proxmox:
4) Networking & storage planning in Proxmox
4.1 Network layout
-
vmbr0— management (Proxmox web UI, migrations) — on private VLAN. -
vmbr1— VM public/production network. -
vmbr2— storage network (optional, for Ceph/RBD / Longhorn replication).
Ensure corosync uses a private dedicated interface to avoid interference.
4.2 Storage choices
-
For small-medium: Proxmox ZFS on local nodes + Proxmox Backup Server, or use Longhorn inside K8s for PVs.
-
For enterprise-scale: Ceph on separate OSDs integrated with Proxmox (RBD) or external SAN.
Recommendation: Use Longhorn (K8s-native distributed block storage) for simplicity unless you already run Ceph.
5) Create cloud-init VM template for Kubernetes nodes
We use cloud-init to speed provisioning. Create a minimal Ubuntu 22.04/24.04 LTS cloud-init template.
5.1 Example cloud-init userdata
Create user-data for cloud-init:
Important: swap must be disabled for kubelet.
Create a Proxmox VM from Ubuntu cloud ISO, configure cloud-init drive via GUI, convert to template, and then clone for masters/workers. (Reference: the Cloudfleet article on Proxmox cloud-init usage. Cloudfleet+1)
6) Deploy VMs for control plane and workers
Suggested VM sizing
-
Master: 4 vCPU, 16–32 GB RAM, 50 GB root disk (adjust to workload), 2 NICs (management + data).
-
Worker: 4 vCPU, 8–16 GB RAM, variable disk.
Create 3 master VMs: k8s-master-1, k8s-master-2, k8s-master-3. Create N worker VMs as needed.
Set static IPs (via cloud-init or in DNS) and verify SSH connectivity.
7) HA API server: keepalived + HAProxy (or metalLB for LB if you have cloud IPs)
We create an HA pair (or run on separate small VMs) to expose a virtual IP (VIP) for kube-apiserver. VIP forwards 6443 to all control-plane endpoints. (See HA topology guidance. Kubernetes+2Flatcar+2)
7.1 Install keepalived + haproxy (example on LB nodes)
keepalived.conf (example for VIP 10.10.0.100):
HAProxy config (only show API forwarding):
Restart services: systemctl restart keepalived haproxy
Result: VIP (10.10.0.100) becomes the kube-apiserver endpoint.
8) Install Kubernetes control plane with kubeadm (HA)
We’ll use kubeadm to create a stacked control plane (etcd runs on masters). For production at scale, consider external etcd cluster. (See official docs. Kubernetes+1)
8.1 Pre-reqs on each master VM
On each master VM (run as root or sudo):
8.2 kubeadm config for HA (control-plane init)
Create kubeadm-config.yaml (run on the first master):
Run on master1:
kubeadm init will print:
-
kubeadm joincommand for additional control-plane nodes (with--control-plane) — save it. -
kubeadm joinworker command (without control-plane) — save it.
Set up kubeconfig for admin:
8.3 Join remaining control plane nodes
On master2 and master3 run the kubeadm join … --control-plane command printed earlier. Example (replace tokens and addresses):
After joining all control plane nodes, verify from any master:
Note about etcd: The above uses stacked etcd (etcd runs as static pods on masters). For larger enterprise clusters, consider using an external etcd cluster or managed control plane for resilience and easier recovery. (See docs. Kubernetes)
9) Join worker nodes & configure CNI (Calico)
9.1 Join workers
On each worker VM, install containerd and kubeadm/kubelet as above (see prereqs). Run the worker kubeadm join token printed earlier (the non-control-plane join command).
Verify node readiness:
9.2 Install a CNI — Calico example
Calico provides networking + NetworkPolicy enforcement and is production-ready.
(If you cannot fetch remote manifests in a locked-down environment, download and adapt the manifests.)
After CNI install verify pods in kube-system are running and nodes show Ready:
10) Cluster storage: Longhorn (recommended for small/medium on-prem)
Longhorn is a Kubernetes-native distributed block storage system.
10.1 Install Longhorn
You can install Longhorn via Helm or the YAML from Longhorn.
After installation:
-
Configure Longhorn UI (Service type LoadBalancer or NodePort).
-
Check
Longhorn > Settingsfor replica count and node scheduling.
Set default StorageClass to Longhorn so PVCs use it by default.
11) Ingress, TLS, External DNS
11.1 Install cert-manager
Install via YAML:
Create ClusterIssuer for Let’s Encrypt (staging first, then production) or use internal CA.
11.2 Ingress controller (Traefik or NGINX)
Example NGINX ingress via Helm:
Create Ingress resources and use cert-manager Certificate to provision TLS via Let’s Encrypt.
11.3 External DNS
If you have a DNS provider, you can use external‑dns to automatically create records when you create Services/Ingress. Configure provider secrets (Cloudflare, AWS Route53, etc.).
12) Security hardening (Proxmox + Kubernetes)
12.1 Proxmox hardening checklist
-
Keep Proxmox & kernel updated; test kernel updates in maintenance windows.
-
Restrict management access (UI + SSH) to mgmt subnet or VPN.
-
Enable 2FA for GUI accounts.
-
Use Proxmox Backup Server for VM backups with encryption.
12.2 Kubernetes hardening checklist
-
Ensure RBAC is enabled (default).
-
Apply Pod Security Standards (PSA) or use OPA/Gatekeeper policies.
-
Use NetworkPolicies to restrict pod-to-pod traffic; default deny for namespaces.
-
Run non-root containers and read-only root FS where possible.
-
Limit service account privileges; use
kubectl auth can-ito audit. -
Use Admission Controllers (ResourceQuota, LimitRanger).
-
Encrypt etcd at rest — ensure
--encryption-provider-configis set. -
Rotate certificates & tokens, use short-lived tokens where possible.
-
Use image scanning (Trivy/Clair) in CI pipeline.
-
Integrate with enterprise identity: OIDC for API server auth (e.g., Dex + AD/LDAP) or use Kubernetes RBAC bound to groups in your IdP.
Example enable encryption config (etcd encryption):
Create /etc/kubernetes/encryption-config.yaml:
Refer to kubeadm docs to add --encryption-provider-config to the API server manifest (see official docs. Kubernetes)
12.3 Network hardening
-
Use Calico network segmentation and egress policies.
-
Use firewall rules on Proxmox to limit management networks; restrict control-plane ports (6443) to LB and admin nets.
13) Backups, monitoring, logging, upgrade plan
13.1 Backups
-
Use
etcdsnapshotting and off-site backup copies. -
Back up
kubeadmPKI (certs) and/etc/kubernetes/admin.conf. -
For VMs: use Proxmox Backup Server snapshots & scheduled backups.
-
For cluster state: use Velero for namespaced resource backups and PV snapshot integration (Longhorn supports snapshots).
Velero example:
(Configure provider according to environment.)
13.2 Monitoring & logging
-
Monitoring: Prometheus + Grafana (e.g., kube-prometheus-stack via Helm).
-
Logging: EFK (Elasticsearch/Opensearch + Fluentd/Fluentbit + Kibana/Grafana). Consider hosted/central logging.
-
Alerting: Alertmanager integrated with Slack/PagerDuty.
13.3 Upgrades & maintenance
-
Test upgrades in staging.
-
Control-plane first (one master at a time), drain workers for kubelet/kube-proxy upgrades.
-
Use
kubeadm upgrade planandkubeadm upgrade apply. -
Keep container runtime & OS patched.
14) Quick scripts & useful commands
Create VM from template (pvesh / qm CLI)
Example clone:
kubeadm: view join commands later
On a control plane node:
Validate cluster
15) Example recovery notes
-
If etcd fails: use
etcdctl snapshot savebackups andetcdctl snapshot restore. -
If you lose control plane: restore etcd snapshot to fresh control plane nodes, rejoin workers.
-
Document procedure for disaster recovery; practice restores in staging.
16) Example folder of configs (short list)
-
kubeadm-config.yaml— cluster config -
haproxy.cfg— API LB config -
keepalived.conf— VIP config -
containerd config— tuned for kube -
encryption-config.yaml— etcd encryption -
networkpolicy-deny-all.yaml— default deny template -
pod-security-standards.yaml— PSA templates -
velero-backup.yaml— Velero schedules
17) Final checklist before “go-live”
-
Proxmox nodes patched and backups configured (PBS).
-
Management network isolated and firewall rules applied.
-
VM templates built with cloud-init; swap disabled.
-
HA API VIP + load balancer verified (failover tested).
-
3 control-plane nodes online and
kubectl get nodesshowsReady. -
CNI installed and all pods in
kube-systemareRunning. -
Longhorn (or chosen storage) installed and default StorageClass set.
-
cert-manager + ingress installed; TLS validated for a test app.
-
RBAC policies and NetworkPolicies applied; default deny in test namespaces.
-
Monitoring, logging, alerting integrated; alerts tested.
-
Backup & restore plan documented and tested (Velero + etcd snapshot).
-
Upgrade and node replacement playbook written and rehearsed.
Conclusion
Running full Kubernetes (kubeadm) on VMs inside Proxmox gives you maximum control and enterprise-grade features while keeping the infrastructure flexible. The recipe above is intentionally conservative: HA API via VIP + HAProxy, 3+ control-plane nodes with stacked etcd, Calico for networking, Longhorn for storage, cert-manager + ingress for TLS, and production safety via backups, monitoring, and hardened Proxmox host.
Links & References
-
Official Kubernetes guide: Creating Highly Available Clusters with kubeadm — https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ Kubernetes
-
Kubernetes guide: Options for Highly Available Topology — https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/ Kubernetes
-
Kubernetes guide: Create Cluster with kubeadm — https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ Kubernetes
-
Proxmox + Kubernetes on-premise guide: Deploy Kubernetes on Proxmox — https://cloudfleet.ai/tutorials/on-premises/deploy-kubernetes-on-proxmox-a-step-by-step-tutorial/ Cloudfleet
-
Proxmox + Kubernetes comprehensive guide: Kubernetes on Proxmox: A Comprehensive Guide — https://www.plural.sh/blog/kubernetes-on-proxmox-guide/ Plural
-
Blog on HA Kubernetes using kubeadm: Building a production-grade, HA Kubernetes cluster … — https://medium.com/%40salwan.mohamed/building-a-production-grade-high-availability-kubernetes-cluster-with-kubeadm-a-platform-3951dea0fa9a Medium
Comments
Post a Comment
Got something to say? Drop a comment below — let’s chat!