Fixing graphical error in Proxmox 8+ kernels

In a bid to reduce e-waste (and definitely not because hardware is getting more and more expensive), I often repurpose old hardware, including older laptops, as virtualization servers, running Proxmox. The really neat thing is that I can then group multiple of these devices into one cluster, allowing for high-availability and other fun things.

The Problem

A while ago, I decided to update my Proxmox Virtualization Environment (PVE) 8 to the latest at the time, version 9.1. The upgrade went well, but unfortunately, upon rebooting into the new kernel, I would initially see the kernel logs, but at some point, the screen would flash (changing from the BIOS/UEFI graphics stack to the intel driver) but never come back. On the previous kernels (up to 6.8.*), the screen would flash but then the screen would come back and the output would continue immediately after.

Given that this device is running ZFS-on-root, I initially suspected something there. However, I was able to get logs from the failed boots, which wouldn’t happen if the issue was related to ZFS not loading.

Others mentioned issues with Resizable BAR (ReBAR), which needs to be enabled in the BIOS/UEFI. Unfortunately, the BIOS in this laptop doesn’t actually expose a knob to configure this, so I initially ignored this. However, I now suspect that some of my issues could be related to this.

The First Workaround

Using the systemd-boot interface, I was able to select the older kernel which booted fine.

Proxmox includes this neat utility that allows an admin to manually configure which kernel is picked by default: proxmox-boot-tool kernel pin [--next-boot] version. Using that, I pinned the kernel from pve 8 and ignored the problem.

The Next Update

Then Proxmox released the 7.0 serie kernels and I figured I couldn’t keep dragging the kernel from pve 8.2 with me as I upgraded.

The Current (Final?) Workaround

While I haven’t fully fixed the issue, I discovered a workaround (that I suspect will be permanent). Setting nomodeset in the kernel command line (in /etc/kernel/cmdline) fixed the issue, allowing the newer kernels to be used. The setting forces the kernel to use standard BIOS/UEFI/VESA graphics (as are used at the beginning of the boot sequence) instead of switching to the native drivers.

N.B.: make sure to run proxmox-boot-tool refresh after changing the contents of the cmdline file, otherwise it won’t be used.

Down the road, it would be worthwhile to further debug why the native Intel graphics driver causes this hang, preventing the system from completing its boot-up sequence. But that day isn’t today.