Migrating to the IBM Cloud native KMIP provider

The IBM Cloud Key Protect key management offering has introduced a native KMIP provider to replace the existing “KMIP for VMware” KMIP provider. This new native provider has the advantage of:

Improved performance because KMIP-to-key-provider calls are closer in network distance and no longer cross service-to-service authorization boundaries.
Improved visibility and management for the KMIP keys.

You can find documentation here: Using the key management interoperability protocol (KMIP)

IBM Cloud’s Hyper Protect Cryptographic Services (HPCS) offering is exploring the possibility of supporting native KMIP providers as well. Stay tuned if you are a user of HPCS.

If you already use the KMIP for VMware provider with Key Protect, you should switch to the new native provider for improved performance. Here’s how you can migrate to the new provider.

First, navigate to your Key Protect instance:

Create a KMIP adapter:

You don’t need to upload a vCenter certificate immediately; in fact, remember that vCenter generates a new certificate with each connection attempt.

Click the Endpoints tab to identify the KMIP endpoint you need to configure in vCenter:

Note that, unlike the KMIP for VMware offering, there is only one endpoint. This single hostname is load balanced and is highly available in each region. Now go to vCenter, select the vCenter object, select Configure | Key Providers, then add a standard key provider:

Examine and trust the certificate:

Now select the new key provider, and select the single server in the provider. Click Establish Trust | Make KMS trust vCenter. I prefer to use the vCenter Certificate option which will generate a new certificate just for this connection.

Remember to wait a few seconds before copying the certificate because it may change. Then copy the certificate and click Done:

Importantly, at this step you need to follow my instructions to reconfigure vCenter to trust the KMIP CA certificate instead of the end-entity certificate. You should do this for two reasons: first, you won’t have to re-trust the certificate every time it is rotated. More importantly, in some cases the native KMIP provider serves alternate certificates on the private connection, and this can confuse vSAN encryption. (The alternate certificates both includes the private hostname among their alternate names, so they are valid. The underlying reason for this difference is because VMware is in the process of adding SNI support to their KMIP connections, and the server behavior differs depending on whether the client sends SNI.) Trusting the CA certificate ensures that the connection is trusted even if the alternate certificate is served on the connection.

Then return to the IBM Cloud and view the details of your KMIP adapter:

Select the SSL certificates tab and click Add certificate:

Paste in the certificate you copied from vCenter:

Back in vCenter, it may take several minutes before the key provider status changes to healthy:

First we need to ensure that any new encrypted objects leverage the new key provider. Select the new provider and click Set as Default. You will be prompted to confirm:

Next we need to migrate all existing objects to the new key provider.

I previously wrote how you can accomplish this using PowerCLI. You would have to combine techniques from connecting to multiple key providers with rekeying all objects, by adding the key provider parameter to each command. After importing the VMEncryption and VsanEncryption modules and connecting to vCenter, this would look something like the following.

WARNING: Since first publishing this, I have learned that in some configurations, vSphere HA may reboot a virtual machine that is encrypted with vSphere encryption and which is being rekeyed. Please read that linked post for information on how you can workaround this problem.

# Rekey host keys used for core dumps
# In almost all cases hosts in the same cluster are protected by the same provider and key,
# but this process ensures they are protected by the new key provider
# It is assumed here that all hosts are already in clusters enabled for encryption.
# Beware: If not, this command will initialize hosts and clusters for encryption.
foreach($myhost in Get-VMHost) {
  echo $myhost.name
  Set-VMHostCryptoKey -VMHost $myhost -KMSClusterId new-key-provider
}

# Display host key providers to verify result
Get-VMhost | Select Name,KMSserver

# Rekey a vSAN cluster
# It is assumed here that the cluster is already enabled for encryption.
# Beware: If not, this command will enable encryption for an unencrypted cluster.
Set-VsanEncryptionKms -Cluster cluster1 -KMSCluster new-key-provider

# Display cluster key provider to verify result
Get-VsanEncryptionKms -Cluster cluster1

# Rekey all encrypted virtual machines
# Each rekey operation starts a task which may take a brief time to complete for each encrypted VM
# Note that this will fail for any virtual machine that has snapshots; you must remove snapshots first
foreach($myvm in Get-VM) {
  if($myvm.KMSserver){
    echo $myvm.name
    Set-VMEncryptionKey -VM $myvm -KMSClusterId new-key-provider
  }
}

# Display all virtual machines' key providers (some are unencrypted) to verify result
Get-VM | Select Name,KMSserver

Note: currently the Set-VsanEncryptionKms function does not appear to work with vCenter 8. Until my bug report is fixed, you will have to use the vCenter UI for the vSAN step. For your cluster, go to Configuration | vSAN | Services. Under Data Services, click Edit. Choose the new key provider and click Apply:

Unfortunately, it is not possible to make all of these changes in the vCenter UI. You can rekey an individual VM against the new key provider, and as we’ve done above, you can rekey your vSAN cluster against the new key provider. And if you have vSAN encryption enabled, reconfiguring vSAN will also rekey your cluster encryption against the new key provider. But if you are not using vSAN, or if you do not have vSAN encryption enabled, I don’t know of a way to rekey your hosts against the new provider in the UI. (In fact, the cluster configuration UI is somewhat misleading as it indicates you have a choice of key provider, and you can even select the new key provider. But this will only influence the creation of new VMs; it will not rekey the hosts against the new provider.) As a result, you should use PowerCLI to rekey your hosts, and I recommend using it for your VMs as well.

After you have rekeyed all objects, you can remove the original key provider from vCenter:

Now you can delete your KMIP for VMware resource from the cloud UI:

For completeness, you should also delete all of the original keys created by the KMIP for VMware adapter. Recall that VMware leaks keys; if you have many keys to delete, you may wish to use the Key Protect CLI to remove them. You can identify these keys by name; they will have a vmware_kmip prefix:

You may notice that there are no standard keys representing the KMIP keys created by the new native adapter. Instead, its keys are visible within the KMIP symmetric keys tab of your KMIP adapter:

KMIP test client

I created a simple test client for the KMIP protocol leveraging the pykmip Python package.

My purpose in doing so was to enable some simple performance testing; the script generates a continuous series of connections and requests to a KMS.

In the process I discovered that pykmip has not shipped a new release in several years, and is not compatible with recent versions of Python. I included a simple monkey patch which adapts to recent changes in the SSL implementation.

VMware Cloud Director HTTP error 431, part 2

Previously I posted an improved NSX LB configuration for use with VMware Cloud Director that can help to restrict unnecessary cookies and avoid errors with excessively large headers.

If instead you are using VMware Avi Load Balancer in front of your Director cells, I want to highlight a recommended DataScript that you can use to accomplish the same result. My colleague Fahad Ladhani posted this in the comments of Tomas Fojta’s blog, but I’m highlighting it here for greater awareness:

-- HTTP_REQUEST
-- get cookies
cookies, count = avi.http.get_cookie_names()
avi.vs.log("cookies_count_before=" .. count)
-- if cookie(s) exists, validate cookie(s) name
if count >= 1 then
  for cookie_num= 1, #cookies do
    -- only keep cookies: JSESSIONID, rstd, vcloud_session_id, vcloud_jwt, sso-preferred, sso_redirect_org, xxxxx.redirectTo and xxxxx.state
    cookie_name = cookies[cookie_num]
    if cookie_name == "JSESSIONID" or  cookie_name == "rstd" or cookie_name == "vcloud_session_id" or cookie_name == "vcloud_jwt" or cookie_name == "sso-preferred" or cookie_name == "sso_redirect_org" then
      avi.vs.log("keep_cookie=" .. cookie_name)
    elseif string.endswith(cookie_name, ".redirectTo") or string.endswith(cookie_name, ".state") then
      avi.vs.log("keep_cookie=" .. cookie_name)
    else
      -- avi.vs.log("delete_cookie=" .. cookie_name)  -- not logging this because log gets truncated
      avi.http.remove_cookie(cookie_name)
    end
  end
end
-- get cookies
cookies, count = avi.http.get_cookie_names()
avi.vs.log("cookies_count_after=" .. count)

VMware Cloud Director HTTP error 431: Request Header Fields Too Large

Tomas Fojta wrote previously about issues with Cloud Director errors having to do with excessively large cookies. This is a common problem for cloud providers where there may be multiple web applications, some of which fail to properly limit their cookie scope. At the moment I am writing this, my browser is sending about 6.7kB of cookie data when visiting cloud.ibm.com. This is close to the limit supported by Cloud Director, and sometimes it goes over that limit.

Tomas suggested an approach using the NSX load balancer haproxy configuration to filter cookies. Unfortunately, Tomas’s approach does not cover all possible cases. For example, it does not cover the case where only one of these two cookies is present, and it does not cover the case where there are additional cookies in the header after these two cookies. Furthermore, there are additional cookies used by Cloud Director; at a minimum this includes the following:

JSESSIONID
rstd
vcloud_session_id
vcloud_jwt
sso-preferred
sso_redirect_org
*.redirectTo
*.state

If you have a known limited list of cookies (or cookie name patterns) like this that you want to pass to your application, it is relatively easy to program a positive cookie filter with an advanced load balancer such as VMware Avi Load Balancer. But if you are using the NSX embedded load balancer and are limited to the haproxy approach of using reqirep with regular expressions, it is an intractable problem. Therefore, instead of using reqirep to selectively include the cookies that Director needs, I recommend the approach of using reqirep to selectively and iteratively delete cookies that you know are likely to be large and to overflow Director’s supported limit. It may take some iterative experimentation over a period of time for you to identify all of the offending cookies.

For example, we can use the following four rules to remove two of the larger cookies for cloud.ibm.com, neither of which are needed by Director. For each cookie I am removing, I have written a pair of rules: the first rule removes the cookie if it appears anywhere other than the end of the cookie list, and the second removes it if it is at the end of the list:

reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.iamcookie\.prod=[^;]*;(.*)$ \1\ \2
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.iamcookie\.prod=[^;]*$ \1
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.Identity\.prod=[^;]*;(.*)$ \1\ \2
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.Identity\.prod=[^;]*$ \1

vCenter key provider server certificates

I’ve written a couple of posts on vCenter key provider client certificates and caveats related to configuring them. In this post I shift to discussing server certificates.

When you connect to a key provider, vCenter only offers you the option of trusting the provider’s end-entity certificate:

Typically an end-entity certificate has a lifetime of a year or less. This means that you will be revisiting the provider configuration to verify the certificate on at least an annual basis.

However, after you have trusted this certificate, vCenter gives you the option of configuring an alternate certificate to be trusted. You can use this to establish trust with one of your key provider’s CA certificates instead of the end-entity certificate. Typically these have longer lifetimes, so your key provider connectivity will be interrupted much less frequently.

You may have to work with your security admin to obtain the CA certificate, or depending on how your key provider is configured you may be able to obtain the certificate directly from the KMIP connection using a tool like openssl:

root@smoonen-vc [ ~ ]# openssl s_client -connect private.eu-de.kms.cloud.ibm.com:5696 -showcerts
CONNECTED(00000003)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root G2
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
verify return:1
depth=0 C = US, ST = New York, L = Armonk, O = International Business Machines Corporation, CN = private.eu-de.kms.cloud.ibm.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = New York, L = Armonk, O = International Business Machines Corporation, CN = private.eu-de.kms.cloud.ibm.com
   i:C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jun 18 00:00:00 2024 GMT; NotAfter: Jun 17 23:59:59 2025 GMT
-----BEGIN CERTIFICATE-----
MIIHWDCCBkCgAwIBAgIQCK1qBW4aHA51Yl6cJVq96TANBgkqhkiG9w0BAQsFADBZ
. . .

You can then paste this certificate directly into the vCenter UI:

After doing this, vCenter will still display the lifetime validity of the end-entity certificate, rather than the CA certificate. But it will now be trusting the CA certificate, and so this trust will extend to the next version of the end-entity certificate, as long as it is signed by the same CA.

vCenter key provider client certificates, part 2

Previously I explained how vCenter creates a new client certificate with each key provider connection. This is a good thing; it enables you to connect vCenter to the same provider multiple times as a different identity, which can be valuable in certain multitenant use cases.

However, there is also a bug in the vCenter UI that generates this certificate. For a split second, the UI presents one certificate, but then switches to a new value. If you click the copy button too quickly, you will copy the wrong certificate:

Be sure to wait for the screen to refresh before copying your certificate!

Customizing root login for VMware Cloud Director

Your Linux VM running in VMware Cloud Director might be preconfigured with the best-practice configuration to disable root password login. This might prevent you from using the root password that you set with Director’s Guest OS Customization:

#PermitRootLogin prohibit-password

You can override this behavior using a Guest OS Customization script in a couple of ways. The simplest approach is to use your customization script to set the sshd configuration to allow root password logins:

#!/bin/bash
sed -ie "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config

Or, if you prefer, you can use the customization script to insert an SSH public key for the root user:

#!/bin/bash
echo "ssh-rsa AAAAB3...DswrcTw==" >> /root/.ssh/authorized_keys
chmod 644 /root/.ssh/authorized_keys

vSAN sizer

I find that there are several commonly overlooked considerations when sizing a vSAN environment:

It is not recommended to operate a vSAN environment at over 70% capacity
If you use a resilience strategy of FTT=1, you should plan to perform a full evacuation during host maintenance or else you will be at risk of data loss due to drive failure during maintenance. Depending on your configuration and usage, the time required for a full evacuation can easily take 24 hours or more. In addition, a maintenance strategy of full evacuation requires you to leave one host’s worth of capacity empty.
Because of these considerations, I recommend a resilience strategy of FTT=2. With this strategy you have the option of performing host maintenance using ensure-accessibility rather than full evacuation, which is much faster but is still resilient to one failure during maintenance.
If you size your environment strictly to the minimum number of nodes for your configuration, then you will fail to create virtual machines or snapshots during host maintenance—including any snapshots used for backups or replication—unless you force provisioning of the object contrary to the storage policy. For this reason, you should consider provisioning at least one more host than is strictly required.

Many of these considerations are summarized in this helpful VMware blog post, which includes a helpful table documenting host minimums and RAID ratios: Adaptive RAID-5 Erasure Coding with the Express Storage Architecture in vSAN 8.

I’ve taken these considerations and created a vSAN sizer Excel workbook, to help both with planning and sizing a vSAN environment.

GitHub printability

Certain complex GitHub Markdown documents don’t render well—or especially, print well—with the whitespace gutters on either side of the screen. This is especially true when there are complex tables in a document.

I use the following userContent.css stylesheet in my browser to override the use of gutters. This renders complex tables much better:

@-moz-document domain(github.com) {
 .container-lg { max-width: none !important; }
 .container-xl { max-width: none !important; }
}

Some tables are so complex that they still spill off the edge of the page even in portrait mode. To counteract this I adjust the print scaling, or print in landscape mode, or both.

Migration from VMware Shared to VMWaaS

IBM Cloud recently announced the end of sale of new VMware Shared environments, with a planned end of support date for all VMware Shared environments of January 2025.

VMware Shared is IBM Cloud’s first-generation IBM-managed VMware Cloud Director offering. It was based on VMware’s legacy NSX-V network virtualization technology. VMware-as-a-Service (VMWaaS) is IBM Cloud’s next-generation IBM-managed VMware Cloud Director offering. It leverages NSX-T network virtualization and provides other advanced capabilities such as the ability to create dedicated environments whose hosts are not shared with other tenants.

Refer to IBM’s announcement for details on how to migrate between these environments and for minor changes in pricing.

full◦valence

connect◦the◦dots