
High-performance culture, redux



In recent builds of vSphere 7 and vSphere 8, my team has experienced unexpected spontaneous reboots of virtual machines while rekeying them. In our case we were rekeying these machines against a new key provider.
EDIT: Broadcom support has now published KB 387897 documenting this issue. The issue is a kind of race condition between the rekey task and some other activity that is touching the changed block tracking (CBT) file for the virtual machine. Under some conditions the latter activity fails to open the CBT file, and vSphere HA reboots the virtual machine.
The reboots seem unpredictable. Although we are using CBT for backup, we had no in-flight backup job running at the time (since you cannot rekey a virtual machine with snapshots). At times as few as 1% of the rekeyed machines were spontaneously rebooted, but at other times as high as 20% were affected.
We understand that Broadcom will fix this race condition in a future release, but in the meantime if you plan to rekey a virtual machine that is using CBT for backup or replication, you should either:
Here I collect some blog posts with vCenter key provider configuration recommendations:
And here are some additional VMware encryption resources:

I’ve seen a number of cases where vCenter issues intermittent KMS connectivity alarms. This often happens in environments where the network or KMS latency is relatively high. One tip provided by VMware / Broadcom support is to remove expired KMS certificates from the vCenter trust store. This is only my impression, but as best as I can tell, these expired certificates do not prevent successful connectivity, but they can contribute to an increased processing delay which is more likely to trigger health alarms.
If you are experiencing one of the following alarms intermittently, you should consider a cleanup of expired CA certificates:
Broadcom support referred us to the following Knowledge Base articles to view and remove certificates from the vCenter trust store:
In particular, for KMS related alarms, you want to evaluate the certificates in the KMS_ENCRYPTION trust store.
