Proxmox¶
In my five-node cluster running Proxmox VE 7, each Intel NUC node has a unique OS disk. For the NUC12s, it's a 40mm NVME, while the NUC10s sport NVMEs in USB 3 enclosures. For data drives the nodes sport 2TB SATA SSDs and 800GB-2TB NVMEs which are (mostly) high-DWPD Micron MAX enterprise drives. These drives are solely dedicated to Ceph.
For network connectivity, each node is equipped with a Sonnet Solo 10GbE adapter. Ceph operations run over these 10GbE links, while Proxmox's corosync primarily uses a 1GbE management VLAN, with the 10GbE as a fallback.
Overall I am pleased with this setup and it is the foundation of my k8s cluster.
iGPU Passthrough to VM (Intel Integrated Graphics)¶
First you need to figure out if you are using grub or systemd-boot.
If you are using ZFS on root, then you are definitely using systemd-boot (as of PVE 6 and 7).
If /sys/firmware/efi/
exists and has stuff inside, you are probably using systemd-boot.
Add kernel cmdline flags¶
If you are using grub you need to edit /etc/default/grub
, if you are using systemd-boot you need to edit /etc/kernel/cmdline
.
Add the following kernel cmdline flags after whatever is already there. For grub this will be the GRUB_CMDLINE_LINUX_DEFAULT
var, for systemd-boot it is a bare file.
intel_iommu=on i915.enable_gvt=1 iommu=pt
If using grub, update-grub
If using systemd-boot, proxmox-boot-tool refresh
Edit modules¶
Edit /etc/modules
and add:
# Modules required for PCI passthrough
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
# Modules required for Intel GVT-g Split
kvmgt
Reboot.
Confirm GVT-g is working¶
You should have a mdev_supported_types
dir under /sys/bus/pci/devices/$PCI_ADDR/
. Where PCI_ADDR is the pcie address found with lspci -nnv | grep VGA
.
Example:
# ls /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/
i915-GVTg_V5_4 i915-GVTg_V5_8
Passthrough the device to your VM¶
In my case the VM is a talos VM, and I did it like this:
Install With External Boot Drive¶
Assumed setup:
- External NVME-via-USB boot disk
- Internal M.2 NVME for ceph
- Internal SATA SSD for ceph
Ensure the internal disks are wiped, any remnants of a GPT partition table or ZFS will cause problems. See the section about wiping disks.
When installing:
- Choose external disk as the install disk
- Use ZFS RAID-0
- English keyboard layout
- Use mgmt vlan as the network config
- Use fully qualified domain as the hostname
After install, setup zfs encryption:
- Use F10 to PXE boot from
Ubuntu 22.04
-
Follow create an encrypted pool (source)
# Import the old zpool import -f -NR /tmp rpool # check Status zpool status # Make a snapshot of the current one zfs snapshot -r rpool/ROOT@copy # Send the snapshot to a temporary root zfs send -R rpool/ROOT@copy | zfs receive rpool/copyroot # Destroy the old unencrypted root zfs destroy -r rpool/ROOT # Create a new zfs root, with encryption turned on zfs create -o encryption=aes-256-gcm -o keyformat=passphrase rpool/ROOT # enter passphrase # Copy the files from the copy to the new encrypted zfs root zfs send -R rpool/copyroot/pve-1@copy | zfs receive -o encryption=on rpool/ROOT/pve-1 # Set the Mountpoint zfs set mountpoint=/ rpool/ROOT/pve-1 # Check which ZFS pools are encrypted zfs get encryption # Enable autotrim zpool set autotrim=on rpool # Enable compression zfs set recordsize=1M compression=zstd-3 rpool # Export the pool again, so you can boot from it zpool export rpool # Reboot into PVE # Cleanup old root zfs destroy -r rpool/copyroot # Check which ZFS pools are encrypted zfs get encryption # Don't forget to run the rmblr.proxmox_setup role to enable zfs decryption at boot over ssh
After boot:
1.Run setup playbook in bootstrap mode
ansible-playbook run.yml --tags proxmox-setup --limit peirce.mgmt.socozy.casa -e '{"proxmox_acme_enabled": false, "proxmox_upgrade": true }' --ask-pass
systemctl restart networking
4. systemctl reboot
5. Join node to cluster (see below) Wiping disks for clean install¶
- PXE boot into System Rescue CD
- Use gparted to
- delete all partitions
- format with "clear"
- create a fresh gpt partition
- Reboot and PXE boot into Proxmox VE to continue the install
Reinstall A Node¶
Goal: Reinstall Proxmox on a node and reintroduce it to the cluster with the same name and network settings.
Relevant docs: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node
# 1. poweroff node
# 2. delete the node (from another node)
pvecm delnode <NODE NAME>
# 3. reinstall proxmox on the node
# 4. ssh into the node you want to rejoin
pvecm add <ANOTHER-NODE-IP> --use_ssh 1 --link0 <MY-DATA-IP> --link1 <MY-MGMT-IP>
# 5. update certs
pvecm updatecerts
# 6. munge known hosts
# from every node (including the new one), run `ssh <othernode>` and delete the corresponding lines,
# until you don't get any more known host errors.
# 7. If the old node was part of the ceph cluster then you need to scrube any
# mention of that node from /etc/pve/ceph.conf
Install checklist¶
From Proxmox VE 7 the install is straightforward, just choose your settings and go.
When prompted for the hostname, use a FQDN.
# yes
mynode.mydomain.com
# no
mynode
Post-install checklist¶
This checklist is automated with my rmblr-proxmox-setup
role (it might be more up to date than this list!).
- Install pve-no-subscription repo
- Configure the network interfaces
- Install CPU microcode to mitigate CPU bugs
- Enable backports
- Remove pve-enterprise repo
- Disable IPV6
- Disable Wifi and Bluetooth
- Setup dropbear-initramfs to provide root zfs encryption key over ssh at boot time
- Setup encrypted ZFS data storage for guests
- Install my admin tools
- Install acme plugin and cloudflare DNS configuration
- Install borg+borgmatic and configure backups with Healthchecks
Configuring Cluster¶
- Install Proxmox VE on three nodes, use ZFS as the fs.
- Run the bootstrap ansible role
- From one node create the cluster
- Join the other nodes to the cluster
- Run the bootstrap ansible role again, because the joined nodes lose their ACME config, this re-adds it so you have valid certs
Replicating VMs¶
From the Datacenter menu, go to Replication and add manual replication settings for each vm
Replacing a ZFS Boot Disk¶
You might need to replace a proxmox boot disk that is in the ZFS pool.
You should note that the Proxmox partition convention is:
Partition 1 = BIOS Boot
Partition 2 = EFI Boot
Partition 3 = ZFS
This is the replacement procedure
Given /dev/sda
and /dev/sdb
. /dev/sdb
is the disk you want to replace.
Pull your /dev/sdb
from the chassis. Insert your new disk.
Now /dev/sdb
is a fresh disk that needs to join the pool
Copy the partition table from /dev/sda
to /dev/sdb
and initialize new GUIDS for the partitions.
# WARNING the order of these flags is very important. If not careful you'll wipe your good drive.
sgdisk /dev/sda -R /dev/sdb
sgdisk -G /dev/sdb
Next you should replace the bad disk in the ZFS pool. Get the id of the old disk using zpool status
it should be marked as offline.
# Important, use partition 3!
zpool replace -f rpool <OLD DISK> /dev/disk/by-id/sdb-part3
Now ZFS will start resilvering. You can check the status of the resilver process with:
zpool status -v rpool
After resilvering is complete, we need to install the boot environment on the EFI partition (partition # 2).
proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool refresh
This refreshes the boot environments on all EFI/BIOS boot partitions in the system. All disks are now bootable.
Tailscale in a container:¶
To run tailscale succesfully in an LXC container you must add the following to the container's config:
lxc.cgroup.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file
Setting up NFS Share¶
Use an NFS share for storing snippets, isos, etc
- Create dataset in freenas
- Create NFS share in freenas Add authorized IPs
- Create nfs user with dataset as home dir Set wheel as the primary group for the user Disable password
-
Edit the nfs share MapallUser -> nfs user
-
Storage > Pools > NFS Dataset > Edit Perms
Owner: nfs user Group: Wheel Remove other permissions 6. In proxmox: Datacenter > Storage > Add NFS
source: https://www.youtube.com/watch?v=zeOe26fw7lo
Single NIC on Trunk Port but using VLAN¶
auto lo
iface lo inet loopback
iface enp60s0 inet manual
auto vmbr0
iface vmbr0 inet manual
bridge-ports enp60s0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
bridge-pvid 1
auto vmbr0.11
iface vmbr0.11 inet static
address 10.9.10.21/23
gateway 10.9.10.1
Multiple VLANs with Single NIC¶
Goals:
- proxmox web ui and ssh port is on vlan 10
- VMs and LXC containers assigned addreses on vlan 20
Edit /etc/network/interfaces
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet static
address 0.0.0.0
netmask 0.0.0.0
auto eno1.10
iface eno1.1 inet static
address 0.0.0.0
netmask 0.0.0.0
auto eno1.20
iface eno1.2 inet static
address 0.0.0.0
netmask 0.0.0.0
auto vmbr10
iface vmbr10 inet static
address 10.8.10.10/24
gateway 10.8.10.1
bridge_ports eno1.10
bridge_stp off
bridge_fd 0
auto vmbr20
iface vmbr20 inet static
address 10.8.20.5/24
bridge_ports eno1.20
bridge_stp off
bridge_fd 0
Installing Proxmox as a VM in FreeNAS¶
When you create the proxmox virtual machine and boot it the boot process will fail after obtaining a dhcp lease
Starting a root shell on tty3
\nInstallation aborted - unable to continue
To fix this, use the provided shell to
chmod 1777 /tmp
apt update
apt upgrade
Xorg -configure
mv /xorg.conf.new /etc/X11/xorg.conf
vim /etc/X11/xorg.conf
# change the Screen Driver to "fbdev"
startx
Then the installer will start. Install. Then X will exit. Power off the VM. Remove the cdrom device.
Import a cloudimg¶
This has been wrapped in the ansible role proxmox-template
.
wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img
qm create 101 --memory 1024 --net0 virtio,bridge=vmbr20
qm importdisk 101 ./focal-server-cloudimg-amd64.img local-zfs
qm set 101 --scsihw virtio-scsi-pci --scsi0 local-zfs:vm-101-disk-0
qm set 101 --ide2 local-zfs:cloudinit
qm set 101 --boot c --bootdisk scsi0
qm set 101 --serial0 socket --vga serial0
qm set 101 --cipassword test --ciuser ubuntu
qm set 101 --ipconfig0 ip=dhcp
qm set 101 -agent 1
qm template 101
qm clone 101 201 --name dagon
qm set 201 --memory 8192
qm set 201 -agent 1
qm set 201 --ipconfig0 ip=dhcp
qm set 201 --net0 virtio,bridge=vmbr0,tag=10,firewall=1
qm set 201 --net1 virtio,bridge=vmbr0,tag=20,firewall=1
qm set 201 --cicustom "user=snippets:snippets/user-data"
qm resize 201 scsi0 20G
cat << EOF > /etc/pve/firewall/201.fw
[OPTIONS]
enable: 1
[RULES]
GROUP allowssh
EOF
qm clone 101 202 --name hydra
qm set 202 --memory 8192
qm set 202 -agent 1
qm set 202 --ipconfig0 ip=dhcp
qm set 202 --net0 virtio,bridge=vmbr0,tag=10
qm set 202 --cicustom "user=snippets:snippets/user-data"
qm resize 202 scsi0 20G
qm clone 101 203 --name deepone
qm set 203 --memory 8192
qm set 203 -agent 1
qm set 203 --ipconfig0 ip=dhcp
qm set 203 --net0 virtio,bridge=vmbr0,tag=10
qm set 203 --cicustom "user=snippets:snippets/user-data"
qm resize 203 scsi0 20G
qm set 201 --sshkey ~/casey.pub
https://pve.proxmox.com/wiki/Cloud-Init_Support
Custom cloud init userdata¶
- Go to Storage View -> Storage -> Add -> Directory
- Give it an ID such as snippets, and specify any path on your host such as /srv/snippets
- Under Content choose Snippets and de-select Disk image
- Upload (scp/rsync/whatever) your user-data, meta-data, network-config files to your proxmox server in /srv/snippets/snippets/
qm set XXX --cicustom "user=snippets:snippets/user-data"
OR
qm set XXX --cicustom "user=snippets:snippets/user-data,network=snippets:snippets/network-config,meta=snippets:snippets/meta-data"
qm cloudinit dump 204 user
If you following the "Import a cloudimg" above, the vm should already have a cloudinit drive.
https://gist.github.com/aw/ce460c2100163c38734a83e09ac0439a
Remove all containers¶
for i in $(pct list | awk '/\d/{print $1}'); do pct destroy "$i" -purge ; done
Deploy fedora coreos template¶
fedora core os on proxmox
on workstation:
cd workspace/
coreos-installer download -s stable -p qemu -f qcow2.xz --decompress -C .
scp fedora-coreos-34.20210904.3.0-qemu.x86_64.qcow2 PROXMOX_HOST:
on proxmox host: 1. (in ui) create a vm and remove the default disks 2. import the image
qm importdisk 9996 ./fedora-coreos-34.20210904.3.0-qemu.x86_64.qcow2 local-zfs
qm set 9996 --scsi0 local-zfs:vm-9996-disk-1
qm set 9996 --boot order=scsi0
qm template 9996
vi /etc/pve/qemu-server/VMID.conf
add:
args: -fw_cfg name=opt/com.coreos/config,file=/mnt/pve/mali/snippets/server-1.ign
- make sure the snippet file exists, and edit the file for the corresponding server+client number
ZFS Cannot import rpool at boot¶
source0: https://www.thomas-krenn.com/en/wiki/ZFS_cannot_import_rpool_no_such_pool_available_-_fix_Proxmox_boot_problem source1: https://forum.proxmox.com/threads/failed-to-import-rpool-on-bootup-after-system-update.37884/
Problem¶
The Proxmox system does not boot because the rpool created by Proxmox could not be imported because it was not found.
Command: /sbin/zpool import -N "rpool"
Message: cannot import 'rpool' : no such pool available
Error: 1
Failed to import pool 'rpool'.
Manually import the pool and exit.
Cause¶
The disks are not fully addressable at the time of the ZFS pool import and therefore the rpool cannot be imported.[1]
Solution¶
Manually import the zpool with the name rpool and then boot the system again with exit. Afterwards you can change the ZFS defaults, so that before and after the mounting of the ZFS pool 5 seconds will be waited.
# ZFS rpool is imported manually
zpool import -N rpool
exit
# ZFS defaults are changed
nano etc/default/zfs
# ZFS sleep parameters are set to 5
ZFS_INITRD_PRE_MOUNTROOT_SLEEP='5'
ZFS_INITRD_POST_MODPROBE_SLEEP='5'
# initramfs is updated
update-initramfs -u
Afterwards you can reboot the system with reboot and observe the boot process. Before and after the import of the rpool now up to 5 seconds are waited, so that the system can start now properly.
Terraform Provider for Proxmox¶
Configure the terraform api user:
pveum role add Terraform -privs "Datastore.AllocateSpace Datastore.Audit Pool.Allocate Sys.Audit VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.CPU VM.Config.Cloudinit VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Monitor VM.PowerMgmt"
pveum user add terraform@pve --password changeme
pveum aclmod / -user terraform@pve -role Terraform