In recent builds of vSphere 7 and vSphere 8, my team has experienced unexpected spontaneous reboots of virtual machines while rekeying them. In our case we were rekeying these machines against a new key provider.
EDIT: Broadcom support has now published KB 387897 documenting this issue. The issue is a kind of race condition between the rekey task and some other activity that is touching the changed block tracking (CBT) file for the virtual machine. Under some conditions the latter activity fails to open the CBT file, and vSphere HA reboots the virtual machine.
The reboots seem unpredictable. Although we are using CBT for backup, we had no in-flight backup job running at the time (since you cannot rekey a virtual machine with snapshots). At times as few as 1% of the rekeyed machines were spontaneously rebooted, but at other times as high as 20% were affected.
We understand that Broadcom will fix this race condition in a future release, but in the meantime if you plan to rekey a virtual machine that is using CBT for backup or replication, you should either:
- Perform an orderly shutdown of the virtual machine if you cannot tolerate a spontaneous reboot, or
- Disable CBT for the duration of the rekey. You need to evaluate whether your BCDR software can tolerate this, or if you need to perform a full backup or replication to recover from the loss of CBT.
3 thoughts on “Unexpected reboot when rekeying a virtual machine”