PowerCLI native key management capabilities, continued

I mentioned previously that PowerCLI allows you to rekey VM and VMHost objects natively without needing to use community-supported extensions. As far as I can tell, rekeying vSAN clusters still requires you to work in the UI or to use the community-supported extensions.

Examining the code for these extensions, I was able to put together a brief way to display the current key manager in use by each object. This way you can verify your rekeying is successful! Here is an example:

$vmlist = @()
foreach($vm in Get-VM) {
  $vmlist += [pscustomobject]@{ vm = $vm.name; provider = $vm.ExtensionData.Config.KeyId.ProviderId.Id}
}
$vmlist | Format-Table

$hostlist = @()
foreach($vmhost in Get-VMHost) {
  $vmhostview = Get-View $vmhost
  $hostlist += [pscustomobject]@{ host = $vmhost.name; provider = $vmhostview.Runtime.CryptoKeyId.ProviderId.Id}
}
$hostlist | Format-Table

$clusterlist = @()
$vsanclusterconfig = Get-VsanView -Id "VsanVcClusterConfigSystem-vsan-cluster-config-system"
foreach($cluster in Get-Cluster) {
  $encryption = $vsanclusterconfig.VsanClusterGetConfig($cluster.ExtensionData.MoRef).DataEncryptionConfig
  $clusterlist += [pscustomobject]@{ cluster = $cluster.name; provider = $encryption.KmsProviderId.Id }
}
$clusterlist | Format-Table

Deleting all of the keys in your Key Protect or HPCS instance

It’s a common problem that you want to delete an IBM Cloud Key Protect instance but there are still some keys remaining in that instance. For your protection, Key Protect and Hyper Protect Crypto Services require you to take action to delete those keys rather than allowing you to delete them as a side effect of deleting the Key Protect instance itself.

This is challenging if you have a large number of keys. That may be the case if you have a development or test environment that you are cleaning up, or if you have migrated your keys to another key provider.

It’s possible to script this using the Key Protect CLI.

First, login to IBM Cloud and install the key protect plugin if necessary:

$ ibmcloud login --sso
$ ibmcloud plugin install key-protect -r "IBM Cloud"

If your Key Protect instance is private-only you may need to export the KP_PRIVATE_ADDR environment variable to point to the service endpoint or VPE for Key Protect in your region. Next you need to identify the instance id for your Key Protect instance, which you can find in the instance details tab in the IBM Cloud UI, or by using the following command if you know the instance name:

$ ibmcloud resource service-instance smoonenKPmadrid --id
Retrieving service instance smoonenKPmadrid in all resource groups under account Development Account as smoonen@us.ibm.com...
crn:v1:bluemix:public:kms:eu-es:a/3f1b08d9abdc5d98ffeb0d3bdc279c04:1f8011c9-7fd9-4fe9-af5e-2fefcfda8cfc:: 1f8011c9-7fd9-4fe9-af5e-2fefcfda8cfc

You can save typing or pasting in subsequent commands by exporting the instance id:

$ export KP_INSTANCE_ID=1f8011c9-7fd9-4fe9-af5e-2fefcfda8cfc

The following command displays all of the key ids and names in your instance:

$ ibmcloud kp keys

You can adjust this command to display only the key ids:

$ ibmcloud kp keys --output json | jq -r '.[] | .id'

If you are confident that all of these keys can be safely deleted, and you have the appropriate permissions to do so, in your shell session you can loop through these and issue a delete command for each of them:

$ foreach key in $(ibmcloud kp keys --output json | jq -r '.[] | .id')
foreach> do
foreach> ibmcloud kp key delete $key
foreach> done

If any of the keys is known to be in use by a resource, you will receive an error. You may also receive other errors, for example, if you do not have sufficient permission to delete the key. You’ll have to rectify these issues before you can successfully delete the key and the Key Protect instance. For example, the following key was a root key that was in use by a Key Protect KMIP adapter:

Targeting endpoint: https://eu-es.kms.cloud.ibm.com
Deleting key: 'b262754c-f30d-4b5f-984c-f9c21b7ae13a', from instance: '1f8011c9-7fd9-4fe9-af5e-2fefcfda8cfc'...
FAILED
ASSOCIATED_KMIP_ADAPTER_ERR
The key cannot be deleted because it is associated with 1 KMIP adapter(s) in the instance
Correlation-ID:ef7ae793-945f-4b10-aa4b-f24b340bb3e1

Using vSphere Trust Authority to geofence workloads

IBM and Kyndryl have in the past used Entrust BoundaryControl to accomplish geofencing. This worked using a combination of their CloudControl and KeyControl products. The CloudControl product was used by security administrators to install cryptographically signed tags into known trusted host TPMs, and then to describe policies for virtual machines that required them to run on hosts with particular tags. In addition to CloudControl enforcing virtual machine placement, the KeyControl product further integrated with this configuration to ensure that virtual machines running on unapproved hosts could not be successfully decrypted and run. Customers could devise tagging schemes according to their needs, such as prod/nonprod, tier1/tier2, and US/EU.

You can accomplish a similar kind of exclusion or geofencing capability using VMware’s vSphere Trust Authority. Although vTA is designed primarily as a means of ensuring that workloads run on hosts with known trusted firmware and software levels, it also has the capability to trust hosts individually. Rather than trusting the vendor TPM CA certificate, you can trust individual host TPM certificates. This allows you to vet the hosts one by one in your environment, and mark them as trusted only if they meet your criteria, including their geographic location. vTA will then help to ensure that the virtual machines in your environment cannot be successfully decrypted and run on hosts outside of your trusted set.

Like any security solution, attestation and geofencing solutions like BoundaryControl and vTA require extra effort to configure and to administrate. In exchange for this effort, however, you can create compelling sovereign cloud solutions.

Fixed: Unexpected reboot when rekeying a virtual machine

vSphere 8.0u3e (see release notes) fixes the issue where a rekeyed VM may experience an unexpected reboot:

PR 3477772: Encrypted virtual machines with active Change Block Tracking (CBT) might intermittently power off after a rekey operation

Due to a race condition between the VMX and hostd services for a specific VMcrypt-related reconfiguration, encrypted VMs with active CBT might unexpectedly power off during a rekey operation.

This issue is resolved in this release.

From what I can tell, this issue is not currently fixed in the vSphere 7 stream.

Unexpected reboot when rekeying a virtual machine

In recent builds of vSphere 7 and vSphere 8, my team has experienced unexpected spontaneous reboots of virtual machines while rekeying them. In our case we were rekeying these machines against a new key provider.

EDIT: Broadcom support has now published KB 387897 documenting this issue. The issue is a kind of race condition between the rekey task and some other activity that is touching the changed block tracking (CBT) file for the virtual machine. Under some conditions the latter activity fails to open the CBT file, and vSphere HA reboots the virtual machine.

The reboots seem unpredictable. Although we are using CBT for backup, we had no in-flight backup job running at the time (since you cannot rekey a virtual machine with snapshots). At times as few as 1% of the rekeyed machines were spontaneously rebooted, but at other times as high as 20% were affected.

We understand that Broadcom will fix this race condition in a future release, but in the meantime if you plan to rekey a virtual machine that is using CBT for backup or replication, you should either:

  1. Perform an orderly shutdown of the virtual machine if you cannot tolerate a spontaneous reboot, or
  2. Disable CBT for the duration of the rekey. You need to evaluate whether your BCDR software can tolerate this, or if you need to perform a full backup or replication to recover from the loss of CBT.

Common vCenter KMS problems and optimizations

Common vCenter KMS problems and optimizations

Here I collect some blog posts with vCenter key provider configuration recommendations:

And here are some additional VMware encryption resources:

Intermittent vCenter KMS connectivity alarms

I’ve seen a number of cases where vCenter issues intermittent KMS connectivity alarms. This often happens in environments where the network or KMS latency is relatively high. One tip provided by VMware / Broadcom support is to remove expired KMS certificates from the vCenter trust store. This is only my impression, but as best as I can tell, these expired certificates do not prevent successful connectivity, but they can contribute to an increased processing delay which is more likely to trigger health alarms.

If you are experiencing one of the following alarms intermittently, you should consider a cleanup of expired CA certificates:

  • Certificate Status
  • Key Management Server Health Status Alarm
  • KMS Server Certificate Status

Broadcom support referred us to the following Knowledge Base articles to view and remove certificates from the vCenter trust store:

In particular, for KMS related alarms, you want to evaluate the certificates in the KMS_ENCRYPTION trust store.

Migrating to the IBM Cloud native KMIP provider

The IBM Cloud Key Protect key management offering has introduced a native KMIP provider to replace the existing “KMIP for VMware” KMIP provider. This new native provider has the advantage of:

  • Improved performance because KMIP-to-key-provider calls are closer in network distance and no longer cross service-to-service authorization boundaries.
  • Improved visibility and management for the KMIP keys.

You can find documentation here: Using the key management interoperability protocol (KMIP)

IBM Cloud’s Hyper Protect Cryptographic Services (HPCS) offering is exploring the possibility of supporting native KMIP providers as well. Stay tuned if you are a user of HPCS.

If you already use the KMIP for VMware provider with Key Protect, you should switch to the new native provider for improved performance. Here’s how you can migrate to the new provider.

First, navigate to your Key Protect instance:

Create a KMIP adapter:

You don’t need to upload a vCenter certificate immediately; in fact, remember that vCenter generates a new certificate with each connection attempt.

Click the Endpoints tab to identify the KMIP endpoint you need to configure in vCenter:

Note that, unlike the KMIP for VMware offering, there is only one endpoint. This single hostname is load balanced and is highly available in each region. Now go to vCenter, select the vCenter object, select Configure | Key Providers, then add a standard key provider:

Examine and trust the certificate:

Now select the new key provider, and select the single server in the provider. Click Establish Trust | Make KMS trust vCenter. I prefer to use the vCenter Certificate option which will generate a new certificate just for this connection.

Remember to wait a few seconds before copying the certificate because it may change. Then copy the certificate and click Done:

Importantly, at this step you need to follow my instructions to reconfigure vCenter to trust the KMIP CA certificate instead of the end-entity certificate. You should do this for two reasons: first, you won’t have to re-trust the certificate every time it is rotated. More importantly, in some cases the native KMIP provider serves alternate certificates on the private connection, and this can confuse vSAN encryption. (The alternate certificates both includes the private hostname among their alternate names, so they are valid. The underlying reason for this difference is because VMware is in the process of adding SNI support to their KMIP connections, and the server behavior differs depending on whether the client sends SNI.) Trusting the CA certificate ensures that the connection is trusted even if the alternate certificate is served on the connection.

Then return to the IBM Cloud and view the details of your KMIP adapter:

Select the SSL certificates tab and click Add certificate:

Paste in the certificate you copied from vCenter:

Back in vCenter, it may take several minutes before the key provider status changes to healthy:

First we need to ensure that any new encrypted objects leverage the new key provider. Select the new provider and click Set as Default. You will be prompted to confirm:

Next we need to migrate all existing objects to the new key provider.

I previously wrote how you can accomplish this using PowerCLI. You would have to combine techniques from connecting to multiple key providers with rekeying all objects, by adding the key provider parameter to each command. After importing the VMEncryption and VsanEncryption modules and connecting to vCenter, this would look something like the following.

WARNING: Since first publishing this, I have learned that in some configurations, vSphere HA may reboot a virtual machine that is encrypted with vSphere encryption and which is being rekeyed. Please read that linked post for information on how you can workaround this problem.

# Rekey host keys used for core dumps
# In almost all cases hosts in the same cluster are protected by the same provider and key,
# but this process ensures they are protected by the new key provider
# It is assumed here that all hosts are already in clusters enabled for encryption.
# Beware: If not, this command will initialize hosts and clusters for encryption.
foreach($myhost in Get-VMHost) {
  echo $myhost.name
  Set-VMHostCryptoKey -VMHost $myhost -KMSClusterId new-key-provider
}

# Display host key providers to verify result
Get-VMhost | Select Name,KMSserver

# Rekey a vSAN cluster
# It is assumed here that the cluster is already enabled for encryption.
# Beware: If not, this command will enable encryption for an unencrypted cluster.
Set-VsanEncryptionKms -Cluster cluster1 -KMSCluster new-key-provider

# Display cluster key provider to verify result
Get-VsanEncryptionKms -Cluster cluster1

# Rekey all encrypted virtual machines
# Each rekey operation starts a task which may take a brief time to complete for each encrypted VM
# Note that this will fail for any virtual machine that has snapshots; you must remove snapshots first
foreach($myvm in Get-VM) {
  if($myvm.KMSserver){
    echo $myvm.name
    Set-VMEncryptionKey -VM $myvm -KMSClusterId new-key-provider
  }
}

# Display all virtual machines' key providers (some are unencrypted) to verify result
Get-VM | Select Name,KMSserver

Note: currently the Set-VsanEncryptionKms function does not appear to work with vCenter 8. Until my bug report is fixed, you will have to use the vCenter UI for the vSAN step. For your cluster, go to Configuration | vSAN | Services. Under Data Services, click Edit. Choose the new key provider and click Apply:

Unfortunately, it is not possible to make all of these changes in the vCenter UI. You can rekey an individual VM against the new key provider, and as we’ve done above, you can rekey your vSAN cluster against the new key provider. And if you have vSAN encryption enabled, reconfiguring vSAN will also rekey your cluster encryption against the new key provider. But if you are not using vSAN, or if you do not have vSAN encryption enabled, I don’t know of a way to rekey your hosts against the new provider in the UI. (In fact, the cluster configuration UI is somewhat misleading as it indicates you have a choice of key provider, and you can even select the new key provider. But this will only influence the creation of new VMs; it will not rekey the hosts against the new provider.) As a result, you should use PowerCLI to rekey your hosts, and I recommend using it for your VMs as well.

After you have rekeyed all objects, you can remove the original key provider from vCenter:

Now you can delete your KMIP for VMware resource from the cloud UI:

For completeness, you should also delete all of the original keys created by the KMIP for VMware adapter. Recall that VMware leaks keys; if you have many keys to delete, you may wish to use the Key Protect CLI to remove them. You can identify these keys by name; they will have a vmware_kmip prefix:

You may notice that there are no standard keys representing the KMIP keys created by the new native adapter. Instead, its keys are visible within the KMIP symmetric keys tab of your KMIP adapter:

KMIP test client

I created a simple test client for the KMIP protocol leveraging the pykmip Python package.

My purpose in doing so was to enable some simple performance testing; the script generates a continuous series of connections and requests to a KMS.

In the process I discovered that pykmip has not shipped a new release in several years, and is not compatible with recent versions of Python. I included a simple monkey patch which adapts to recent changes in the SSL implementation.