OpenShift Virtualization on IBM Cloud, part 5: Migrating a virtual machine

See all blog posts in this series:

  1. OpenShift Virtualization on IBM Cloud, part 1: Introduction
  2. OpenShift Virtualization on IBM Cloud, part 2: Becoming familiar with VPC
  3. OpenShift Virtualization on IBM Cloud, part 3: Deploying ROKS, ODF, and OCP Virt
  4. OpenShift Virtualization on IBM Cloud, part 4: Creating a virtual machine
  5. OpenShift Virtualization on IBM Cloud, part 5: Migrating a virtual machine
  6. OpenShift Virtualization on IBM Cloud, part 6: Backup and restore
  7. OpenShift Virtualization on IBM Cloud, part 7: Dynamic resource scheduling

In this post, we will install the OpenShift Migration Toolkit for Virtualization and use it to migrate a VMware virtual machine to OpenShift Virtualization.

Install the migration toolkit

In the OpenShift web UI, navigate to Operators | OperatorHub and search for “migration.” Select the “Migration Tookit for Virtualization Operator” then click “Install.” I didn’t customize any of the parameters.

Afterwards this prompted me to create a custom resource for the ForkliftController.

In time a Migration for Virtualization menu item appears in the web UI.

Preparation

I deployed an Ubuntu VM into an overlay network in an IBM Cloud “VCS” instance (AKA “VCF on Classic Automated”) and connected my classic account to my VPC using an IBM Cloud Transit Gateway. This particular VCS instance was leveraging NFS storage.

Interestingly, VMware disables CBT by default for virtual machines. I found later in my testing that the migration provider warned me that CBT was disabled. I followed Broadcom’s instructions to manually enable it although this required me to reboot my VM.

In order to create a migration provider, RedHat recommends you create a “VDDK image.” Recent versions of the Migration operator will build this for you, and all you need to do is provide the VDDK toolkit downloaded from Broadcom. See RedHat’s instructions.

Although the migration provider is able to connect to vCenter by IP address rather than hostname, the final migration itself will attempt to connect to a vSphere host by its hostname. Therefore we need to prepare the environment to delegate the VCS instance domain to its domain controllers. I followed the RedHat instructions to configure a forwarding zone in my DNS controller. Here is the clause that I added.

  servers:
  - forwardPlugin:
      policy: Random
      upstreams:
      - 10.50.200.3
      - 10.50.200.4
    name: vcs-resolver
    zones:
    - smoonen.example.com

Create the migration provider

I then went into the Providers view in the OCP web UI and created a VMware provider. Be sure to add /sdk to the end of your vCenter URL as shown below. Note also that the migration operator automatically creates a “host” provider for you, representing your OCP cluster, in the openshift-mtv project. In order to meaningfully migrate your VMs to this provider, it is best to create your VMware provider in the same project.

Create the migration plan

In the OpenShift web console I created a migration plan.

Then I selected my virtual machine.

Then I created a network mapping. The only currently supported network mapping in IBM Cloud ROKS is the pod network.

Then I created a storage mapping, being sure to select the ODF storage.

Then I chose a warm migration.

The preservation of static IPs is not currently supported in ROKS with the Calico provider.

I chose not to create migration hooks. You could use these, for example, to reconfigure the network configuration.

In my migration plan I chose to migrate the VM to the default project. My migration plan actually failed to initialize because it could not retrieve the VDDK image that had been built for me. Either before or after creating the migration plan, run the following command to ensure that it can access the cluster’s image registry:

oc adm policy add-cluster-role-to-user registry-viewer system:serviceaccount:default:default

Then I clicked to start the migration.

The migration created a snapshot and left my VM running.

After this completed, the VM remains running on the VMware side and is not yet instantiated on the ROKS side. The migration plan appears in a “paused” state.

Next I performed the cutover. I had a choice to run it immediately or schedule it for a future time.

The cutover resulted in the stopping of my VM on the VMware side, the removal of the snapshot, and the creation and removal of an additional snapshot; I presume this represented the replication of the remaining data as signaled by CBT.

It then created and started a VM on the ROKS side.

In order to establish network connectivity for this VM, it was necessary to reconfigure its networking. The static IP must be exchanged for DHCP. In my case I also found that the device name changed.

For completeness I also installed qemu-guest-agent but it appears this is not strictly necessary. I then edited /boot/efi/loader/loader.conf to force the loading of virtio modules per Ubuntu instructions. After doing so, it appears that they are in use.

In theory, MTV should have both triggered the installation of qemu-guest-agent as well as the installation of the virtio drivers. I observed that on first boot it did attempt to install the agent, but understandably failed because the network connction was not yet established.

OpenShift Virtualization on IBM Cloud, part 4: Creating a virtual machine

See all blog posts in this series:

  1. OpenShift Virtualization on IBM Cloud, part 1: Introduction
  2. OpenShift Virtualization on IBM Cloud, part 2: Becoming familiar with VPC
  3. OpenShift Virtualization on IBM Cloud, part 3: Deploying ROKS, ODF, and OCP Virt
  4. OpenShift Virtualization on IBM Cloud, part 4: Creating a virtual machine
  5. OpenShift Virtualization on IBM Cloud, part 5: Migrating a virtual machine
  6. OpenShift Virtualization on IBM Cloud, part 6: Backup and restore
  7. OpenShift Virtualization on IBM Cloud, part 7: Dynamic resource scheduling

I had some initial difficulties creating a virtual machine from the OpenShift web console UI in the Virtualization | Catalog page, but later this worked okay. Here is a screenshot of that page, but in this post I will document a command-line approach.

For my command-line approach, I first used ssh-keygen to create an SSH key pair, and then created a secret based on the public key:

oc create secret generic smoonen-rsakey --from-file=rhel-key.pub -n=default

I then created a YAML file, referencing this secret, and with the help of the example YAML generated by the OpenShift console UI. Here is my configuration:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: rhel-10-smoonen5
  namespace: default
spec:
  dataVolumeTemplates:
    - metadata:
        name: rhel-10-smoonen5-volume
      spec:
        sourceRef:
          kind: DataSource
          name: rhel10
          namespace: openshift-virtualization-os-images
        storage:
          resources:
            requests:
              storage: 30Gi
  instancetype:
    name: u1.large
  preference:
    name: rhel.10
  runStrategy: Always
  template:
    metadata:
      labels:
        network.kubevirt.io/headlessService: headless
    spec:
      domain:
        devices:
          autoattachPodInterface: false
          disks: []
          interfaces:
            - masquerade: {}
              name: default
      networks:
        - name: default
          pod: {}
      subdomain: headless
      volumes:
        - dataVolume:
            name: rhel-10-smoonen5-volume
          name: rootdisk
        - cloudInitNoCloud:
            userData: |
              #cloud-config
              chpasswd:
                expire: false
              password: xxxx-xxxx-xxxx
              user: rhel
              runcmd: []
          name: cloudinitdisk
      accessCredentials:
        - sshPublicKey:
            propagationMethod:
              noCloud: {}
            source:
              secret:
                secretName: smoonen-rsakey

I applied this by running the command oc apply -f virtual-machine.yaml.

Connecting to the virtual machine

I relied on this blog post which describes several methods for connecting to a virtual machine.

I chose to use virtctl/SSH. Steps:

  1. Login to OpenShift web console
  2. Click question mark icon in top right and select Command Line Tools
  3. Scroll down and download virtctl for your platform.
  4. If you are on a Mac, follow the same steps performed earlier with oc to allow virtctl to run.

Here you can see me connecting to my virtual machine.

Performance

Be sure to read Neil Taylor’s blog posts referenced in the first post in this series, which explain why this has an address of 10.0.2.2.

As it stands it can reach out to the public network, since I configured a public gateway on the worker nodes’ subnet. Although I believe I have entitlement to run RHEL on these workers, the VM is not initially connected to a Satellite server or to any repositories. I wanted to run a quick iperf3 test, but this makes it not as simple as doing a yum install. I was able eventually to snag libsctp and iperf3 RPMs and ran a simple test. Compared to a VMware VM running on VPC bare metal, the ROKS VM gets comparable throughput on iperf3 tests to public servers.

As I receive more insight into the RHEL entitlement I will document this.

Inbound connectivity to VM

NLB (layer 4) does not currently support bare metal members. Therefore we need to create an ALB (layer 7). I created a public one just to see how that works. I’m reasoning through what I need to build based on Neil’s blog and IBM Cloud documentation.

Here is the YAML I constructed:

apiVersion: v1
kind: Service
metadata:
  name: smoonen-rhel-vpc-alb-3
  annotations:
    service.kubernetes.io/ibm-load-balancer-cloud-provider-ip-type: "public"
    # Restrict inbound to my IPs
    service.kubernetes.io/ibm-load-balancer-cloud-provider-vpc-security-group: "smoonen-jump-sg"
spec:
  type: LoadBalancer
  selector:
    vm.kubevirt.io/name: rhel-10-smoonen5
  ports:
  - port: 22
    protocol: TCP
    targetPort: 22

Importantly, you should not specify the service.kubernetes.io/ibm-load-balancer-cloud-provider-vpc-lb-name annotation; what IBM Cloud calls a persistent load balancer. This reuses an existing load balancer of that name if it exists. So, for example, if you have a scenario where you are testing restore of an application to a new and temporary namespace, it will hijack the load balancer for your running application.

After provisioning this, I was able to successfully SSH into my VM with the load balancer resource that was created.

OpenShift Virtualization on IBM Cloud, part 3: Deploying and configuring ROKS, ODF, and OCP Virt

See all blog posts in this series:

  1. OpenShift Virtualization on IBM Cloud, part 1: Introduction
  2. OpenShift Virtualization on IBM Cloud, part 2: Becoming familiar with VPC
  3. OpenShift Virtualization on IBM Cloud, part 3: Deploying ROKS, ODF, and OCP Virt
  4. OpenShift Virtualization on IBM Cloud, part 4: Creating a virtual machine
  5. OpenShift Virtualization on IBM Cloud, part 5: Migrating a virtual machine
  6. OpenShift Virtualization on IBM Cloud, part 6: Backup and restore
  7. OpenShift Virtualization on IBM Cloud, part 7: Dynamic resource scheduling

In this article we will work through the steps of creating a ROKS cluster, deploying and configuring prerequisites for OpenShift Virtualization, and installing OpenShift Virtualization.

Create ROKS instance

Click on the IBM Cloud hamburger menu and select Containers | Clusters. Click Create cluster. Ensure that RedHat OpenShift is selected and VPC is selected. Choose your VPC and select the region and zone(s) of interest. For the purpose of my testing I am creating a single zone.

Select OCP licensing as you require. In my case I needed to purchase a license.

Take care in your selection of worker nodes. Currently virtualization is supported only with bare metal worker nodes. In my case I selected three bare metals each with some amount of extra storage which I will use for Ceph/ODF software-defined storage.

If you wish, encrypt your worker node storage using Key Protect.

I chose to attach a Cloud Object Storage instance for image registry.

I thought at first that I would enable outbound traffic protection to learn how to make use of it. However, the OpenShift Virtualization operator documentation indicates that you should disable it.

I selected cluster encryption as well.

At present I chose not to leverage ingress secrets management or custom security groups.

Enable activity tracking, logging, and monitoring as needed, then click Create.

Note: it is wise to open a ticket to ask for assistance from IBM Cloud support to check for bare metal capacity in your chosen VPC region and zone. In my case my first deployment attempt failed as insufficient bare metals of my selected flavor were available in the zone; this is why I have a jump server in zone 1 but workers in zone 3. Although my second deployment had one host fail, this was not due to capacity but apparently to an incidental error. Redeploying a new worker in its place worked fine. It’s difficult to assess the total deployment time in light of these errors, but I would guess it was somewhere between 2-3 hours.

Check NVMe disks

At the time of this writing, recent CoreOS kernel versions appear to have a bug where several NVMe drives are not properly mounted. After the cluster is provisioned, login to the OpenShift web console and use the Terminal feature on each host to display whether your system has all of its NVMe disks. For example, the profile I deployed should have 8 disks. If there are missing disks, follow the steps in the screenshot below to rediscover them, using the ids from the error messages.

Once your drives are all present, you can proceed to install OpenShift Data Foundation (ODF), which is currently a requirement for OpenShift Virtualization.

Install OpenShift Data Foundation (ODF)

ODF is a convenient wrapper for software-defined storage based on Ceph. It is the OpenShift equivalent of VMware vSAN. In this case I’m deploying a single zone / failure domain, with a default configuration of 3-way mirroring, but ODF is able to provide other configurations including multiple zonal fault domains.

Because it must be licensed and in order to provide other custom integrations with IBM Cloud, the ODF installation is driven from the IBM Cloud UI rather than from the OpenShift OperatorHub. In the IBM Cloud UI, on your cluster’s Overview tab, scroll down and click Install on the OpenShift Data Foundation card.

Below is an example of the input parameters I used. Note that I did not enable volume encryption because the integration with KP and HPCS was not clear to me. Most importantly, you should be careful with the pod configuration. For local storage, ignore the fact that the pod size appears to be 1GiB. This simply indicates the minimum claim that ODF will attempt; in reality it will be greedy and will make use of your entire NVMe drive. Also for the number of pods, specify the number of NVMe disks on each host that you want to consume. Although I have three hosts, I have 8 NVMe disks on each host and wish to use all of them. For this reason I specified a pod count of 8.

Note that it takes some time to install, deploy, and configure all components.

Install OpenShift Virtualization

After the ODF installation completes, you need to install the OpenShift Virtualization operator using the OpenShift CLI (oc). Although the IBM Cloud CLI has an “oc” operator, this is not a proxy for the oc CLI but rather an alias for IBM’s ks plugin. I performed the following steps:

First, in the IBM Cloud UI, click through to the OpenShift web console. In the top-right corner, click the ? icon and choose Command Line Tools. Download the tool appropriate to your workstation.

In my case, on MacOS, I had to override the security checks for downloaded software. I attempted to run oc and received an error. I then opened the System Settings app, selected Privacy & Security, scrolled to the bottom, and selected “Open Anyway” for oc.

Then, in the IBM Cloud UI, I clicked through to the OpenShift web console. In the top-right corner I clicked on my userid and then selected Copy login command. Then I ran the login command on my workstation.

Finally, I followed the IBM Cloud instructions for installing the OpenShift Virtualization operator. Because I intend to use ODF/Ceph storage rather than block or file, I performed the step to mark block as non-default, but I did not install or configure file storage.

I have some thoughts on what the upgrade process might look like for ODF / Ceph when upgrading my cluster and worker nodes. I’m waiting for a new supported release of ODF to test these out and will post my experience once I’ve had a chance to test it.

OpenShift Virtualization on IBM Cloud, part 2: Becoming familiar with VPC

See all blog posts in this series:

  1. OpenShift Virtualization on IBM Cloud, part 1: Introduction
  2. OpenShift Virtualization on IBM Cloud, part 2: Becoming familiar with VPC
  3. OpenShift Virtualization on IBM Cloud, part 3: Deploying ROKS, ODF, and OCP Virt
  4. OpenShift Virtualization on IBM Cloud, part 4: Creating a virtual machine
  5. OpenShift Virtualization on IBM Cloud, part 5: Migrating a virtual machine
  6. OpenShift Virtualization on IBM Cloud, part 6: Backup and restore
  7. OpenShift Virtualization on IBM Cloud, part 7: Dynamic resource scheduling

Introduction

IBM Cloud offers the opportunity to create virtual private clouds, which are software-defined network bubbles where you provision cloud resources and infrastructure into a network address space allocated and managed by you. For some more background, read and watch “What is a virtual private cloud?

Our OpenShift resources will be provisioned into this VPC space. So first we need to create a VPC, and choose the network addressing. In addition, because this is a private network space, we will need to gain access to it. There are two common modes of access: VPN, and jump server. For the purposes of my experiment I created a jump server, which will also help to introduce us to some VPC concepts.

In this article I show you how to create an IBM Cloud VPC and jump server VSI (virtual server instance; i.e., virtual machine) using the IBM Cloud UI. Of course, you can also use the IBM Cloud CLI, APIs, or SDKs to do this as well. I have on GitHub samples of Python code to create a VPC and to create a jump server.

Create a VPC

After logging in to your IBM Cloud account, click the “hamburger menu” button in the top-left, then select Infrastructure | Network | VPCs.

From the Region drop-down, select the region of your choice, and then click Create.

As it works currently, if you allow the VPC to create a default address prefix for you, the prefix will be automatically selected for you without your ability to modify it. I prefer to choose my own address prefix and therefore I deselect this checkbox before clicking the Create button.

After creating your VPC, view the list of VPCs and click on your new VPC to display its details. Select the Address prefixes tab. For each zone where you plan to create resources or run workloads, create an address prefix. For example, I created a VSI in zone 1 and OpenShift worker nodes in zone 3, so I have address prefixes created in these two zones.

Interestingly, the address prefix is not itself a usable subnet in a zone. Instead, it is a broader construct that represents an address range out of which you can create one or more usable subnets in that zone. Therefore, you need to go to Infrastructure | Network | Subnets and create a subnet in each zone where you will be creating resources or running workloads. Note carefully that you choose the region and name of your subnet before you choose the VPC in which to create it. At that point you can choose which address prefix it should draw from. In my case I used up the entire address prefix for each of my subnets.

For your convenience, I also recommend that you choose to attach a public gateway to your subnet. The public gateway allows resources on the subnet to communicate with public networks, but only in the outbound direction.

Create a jump server

First you should create a security group to restrict access to the jump server. Navigate to Infrastructure | Network | Security groups and click Create. Ensure that your new VPC is selected, create one or more rules to represent the allowed inbound connections, and then create an outbound rule allowing all traffic.

Next, navigate to Infrastructure | Compute | Virtual server instances and click Create.

Select the zone and your new VPC. Note that the VPC selection is far down the page so it is easy to miss this. Choose your preferred operating system image; e.g., Windows Server 2025. Customize the VSI profile if you need more or different horsepower for your VM.

Unless you already have an SSH key, create a new one as part of this flow. The UI will save the private key to your system. Be sure to hold on to this for later.

It is fine to take most of the default settings for network and storage unless you prefer to select a specific IP from your subnet. However, you do need to edit the network attachment and select the security group you created above instead of the VPC default group. You’ll notice that the creation of your VSI results in the creation of something called a virtual network interface, or VNI. The VNI is an independent object that mediates the VSI’s attachment to an IP address in your subnet. VNIs serve as an abstract model for such attachments and can be attached to other resources such as file storage and bare metal servers. You could elect to allow spoofing on your VNI (which would be necessary if you wanted your VSI to share a VIP with other VSIs or to route traffic for additional IPs and networks), and also to allow the VNI to continue to exist even after the VSI is deleted.

Click Create virtual server.

Jump server authentication

If you created a Linux jump server, you can use the SSH private key created earlier to connect to your jump server using SSH. However, if you created a Windows jump server, the Administrator password is encrypted using the SSH key you created earlier. Here is how you can decrypt the Administrator password using this key. Select your VSI. On the instance details panel, copy the VSI instance id.

Click the IBM Cloud Shell icon in the top right corner of the IBM Cloud UI. This will open a new tab in your browser. Ensure that your region of choice is selected.

Within the IBM Cloud Shell in your browser, run a common editor to create a new privkey.txt file in the cloud shell; e.g., vi privkey.txt or nano privkey.txt. Locate the private key file that was downloaded to your system, copy its contents, paste them into the cloud shell editor, and save the file. Then run the following command in the Cloud Shell, substituting the VSI instance ID which is visible in the VSI details page:

ibmcloud is instance-initialization-values 0717_368f7ea8-0879-465f-9ab3-02ede6549b6c --private-key @privkey.txt

For example:

Public IP address

The last thing we need to do is assign a public IP to our jump server. Navigate to Infrastructure | Network | Floating IPs and click Reserve.

Select the appropriate zone, then select the jump server as the resource to bind to. Click Reserve. Note that we did not have to apply our security group at this point because it has already previously been applied to the VSI interface.

Note the IP that was created for you. You can now connect to your jump server using this IP and either the SSH key or password from earlier in this procedure.

OpenShift Virtualization on IBM Cloud, part 1: Introduction

See all blog posts in this series:

  1. OpenShift Virtualization on IBM Cloud, part 1: Introduction
  2. OpenShift Virtualization on IBM Cloud, part 2: Becoming familiar with VPC
  3. OpenShift Virtualization on IBM Cloud, part 3: Deploying ROKS, ODF, and OCP Virt
  4. OpenShift Virtualization on IBM Cloud, part 4: Creating a virtual machine
  5. OpenShift Virtualization on IBM Cloud, part 5: Migrating a virtual machine
  6. OpenShift Virtualization on IBM Cloud, part 6: Backup and restore
  7. OpenShift Virtualization on IBM Cloud, part 7: Dynamic resource scheduling

In the VMware world, there is presently a lot of interest in alternative virtualization solutions such as RedHat’s OpenShift Virtualization. In the past I’ve used RedHat Virtualization, or RHEV. RedHat has discontinued their RHEV offering and is focusing their virtualization efforts and investment in OpenShift Virtualization instead. In order to become familiar with OpenShift Virtualization I resolved to experiment with it via IBM Cloud’s managed OpenShift offering, RedHat OpenShift on IBM Cloud, affectionately known as “ROKS” (RedHat OpenShift Kubernetes Service) in my circles.

My colleague Neil Taylor was tremendously helpful in providing background information to familiarize myself with the technology for the purposes of my experiment. He has written a series of blog posts with the purpose of familiarizing VMware administrators like myself with OpenShift Virtualization and specifically the form it takes in IBM Cloud’s managed offering. If you are interested in following along with my experiment, you should read his articles first:

  1. OpenShift Virtualization on IBM Cloud ROKS: a VMware administrator’s guide to storage
  2. OpenShift Virtualization on IBM Cloud ROKS: a VMware administrator’s guide to networking
  3. OpenShift Virtualization on IBM Cloud ROKS: a VMware administrator’s guide to migrating VMware VMs to OpenShift Virtualization
  4. OpenShift Virtualization on IBM Cloud ROKS: Advanced Networking – A VMware Administrator’s Guide

I expect that in the future we will see IBM Cloud ROKS adopting the new user-defined networking capabilities that are coming to OpenShift Virtualization soon, but I expect it will take some time to operationalize these capabilities in the IBM Cloud virtual private cloud (VPC) environment. In the meantime I’m content to experiment with virtualization within the limits of Calico networking.

Migrating to the IBM Cloud native KMIP provider

The IBM Cloud Key Protect key management offering has introduced a native KMIP provider to replace the existing “KMIP for VMware” KMIP provider. This new native provider has the advantage of:

  • Improved performance because KMIP-to-key-provider calls are closer in network distance and no longer cross service-to-service authorization boundaries.
  • Improved visibility and management for the KMIP keys.

You can find documentation here: Using the key management interoperability protocol (KMIP)

IBM Cloud’s Hyper Protect Cryptographic Services (HPCS) offering is exploring the possibility of supporting native KMIP providers as well. Stay tuned if you are a user of HPCS.

If you already use the KMIP for VMware provider with Key Protect, you should switch to the new native provider for improved performance. Here’s how you can migrate to the new provider.

First, navigate to your Key Protect instance:

Create a KMIP adapter:

You don’t need to upload a vCenter certificate immediately; in fact, remember that vCenter generates a new certificate with each connection attempt.

Click the Endpoints tab to identify the KMIP endpoint you need to configure in vCenter:

Note that, unlike the KMIP for VMware offering, there is only one endpoint. This single hostname is load balanced and is highly available in each region. Now go to vCenter, select the vCenter object, select Configure | Key Providers, then add a standard key provider:

Examine and trust the certificate:

Now select the new key provider, and select the single server in the provider. Click Establish Trust | Make KMS trust vCenter. I prefer to use the vCenter Certificate option which will generate a new certificate just for this connection.

Remember to wait a few seconds before copying the certificate because it may change. Then copy the certificate and click Done:

Importantly, at this step you need to follow my instructions to reconfigure vCenter to trust the KMIP CA certificate instead of the end-entity certificate. You should do this for two reasons: first, you won’t have to re-trust the certificate every time it is rotated. More importantly, in some cases the native KMIP provider serves alternate certificates on the private connection, and this can confuse vSAN encryption. (The alternate certificates both includes the private hostname among their alternate names, so they are valid. The underlying reason for this difference is because VMware is in the process of adding SNI support to their KMIP connections, and the server behavior differs depending on whether the client sends SNI.) Trusting the CA certificate ensures that the connection is trusted even if the alternate certificate is served on the connection.

Then return to the IBM Cloud and view the details of your KMIP adapter:

Select the SSL certificates tab and click Add certificate:

Paste in the certificate you copied from vCenter:

Back in vCenter, it may take several minutes before the key provider status changes to healthy:

First we need to ensure that any new encrypted objects leverage the new key provider. Select the new provider and click Set as Default. You will be prompted to confirm:

Next we need to migrate all existing objects to the new key provider.

I previously wrote how you can accomplish this using PowerCLI. You would have to combine techniques from connecting to multiple key providers with rekeying all objects, by adding the key provider parameter to each command. After importing the VMEncryption and VsanEncryption modules and connecting to vCenter, this would look something like the following.

WARNING: Since first publishing this, I have learned that in some configurations, vSphere HA may reboot a virtual machine that is encrypted with vSphere encryption and which is being rekeyed. Please read that linked post for information on how you can workaround this problem.

# Rekey host keys used for core dumps
# In almost all cases hosts in the same cluster are protected by the same provider and key,
# but this process ensures they are protected by the new key provider
# It is assumed here that all hosts are already in clusters enabled for encryption.
# Beware: If not, this command will initialize hosts and clusters for encryption.
foreach($myhost in Get-VMHost) {
  echo $myhost.name
  Set-VMHostCryptoKey -VMHost $myhost -KMSClusterId new-key-provider
}

# Display host key providers to verify result
Get-VMhost | Select Name,KMSserver

# Rekey a vSAN cluster
# It is assumed here that the cluster is already enabled for encryption.
# Beware: If not, this command will enable encryption for an unencrypted cluster.
Set-VsanEncryptionKms -Cluster cluster1 -KMSCluster new-key-provider

# Display cluster key provider to verify result
Get-VsanEncryptionKms -Cluster cluster1

# Rekey all encrypted virtual machines
# Each rekey operation starts a task which may take a brief time to complete for each encrypted VM
# Note that this will fail for any virtual machine that has snapshots; you must remove snapshots first
foreach($myvm in Get-VM) {
  if($myvm.KMSserver){
    echo $myvm.name
    Set-VMEncryptionKey -VM $myvm -KMSClusterId new-key-provider
  }
}

# Display all virtual machines' key providers (some are unencrypted) to verify result
Get-VM | Select Name,KMSserver

Note: currently the Set-VsanEncryptionKms function does not appear to work with vCenter 8. Until my bug report is fixed, you will have to use the vCenter UI for the vSAN step. For your cluster, go to Configuration | vSAN | Services. Under Data Services, click Edit. Choose the new key provider and click Apply:

Unfortunately, it is not possible to make all of these changes in the vCenter UI. You can rekey an individual VM against the new key provider, and as we’ve done above, you can rekey your vSAN cluster against the new key provider. And if you have vSAN encryption enabled, reconfiguring vSAN will also rekey your cluster encryption against the new key provider. But if you are not using vSAN, or if you do not have vSAN encryption enabled, I don’t know of a way to rekey your hosts against the new provider in the UI. (In fact, the cluster configuration UI is somewhat misleading as it indicates you have a choice of key provider, and you can even select the new key provider. But this will only influence the creation of new VMs; it will not rekey the hosts against the new provider.) As a result, you should use PowerCLI to rekey your hosts, and I recommend using it for your VMs as well.

After you have rekeyed all objects, you can remove the original key provider from vCenter:

Now you can delete your KMIP for VMware resource from the cloud UI:

For completeness, you should also delete all of the original keys created by the KMIP for VMware adapter. Recall that VMware leaks keys; if you have many keys to delete, you may wish to use the Key Protect CLI to remove them. You can identify these keys by name; they will have a vmware_kmip prefix:

You may notice that there are no standard keys representing the KMIP keys created by the new native adapter. Instead, its keys are visible within the KMIP symmetric keys tab of your KMIP adapter:

VMware Cloud Director HTTP error 431, part 2

Previously I posted an improved NSX LB configuration for use with VMware Cloud Director that can help to restrict unnecessary cookies and avoid errors with excessively large headers.

If instead you are using VMware Avi Load Balancer in front of your Director cells, I want to highlight a recommended DataScript that you can use to accomplish the same result. My colleague Fahad Ladhani posted this in the comments of Tomas Fojta’s blog, but I’m highlighting it here for greater awareness:

-- HTTP_REQUEST
-- get cookies
cookies, count = avi.http.get_cookie_names()
avi.vs.log("cookies_count_before=" .. count)
-- if cookie(s) exists, validate cookie(s) name
if count >= 1 then
  for cookie_num= 1, #cookies do
    -- only keep cookies: JSESSIONID, rstd, vcloud_session_id, vcloud_jwt, sso-preferred, sso_redirect_org, xxxxx.redirectTo and xxxxx.state
    cookie_name = cookies[cookie_num]
    if cookie_name == "JSESSIONID" or  cookie_name == "rstd" or cookie_name == "vcloud_session_id" or cookie_name == "vcloud_jwt" or cookie_name == "sso-preferred" or cookie_name == "sso_redirect_org" then
      avi.vs.log("keep_cookie=" .. cookie_name)
    elseif string.endswith(cookie_name, ".redirectTo") or string.endswith(cookie_name, ".state") then
      avi.vs.log("keep_cookie=" .. cookie_name)
    else
      -- avi.vs.log("delete_cookie=" .. cookie_name)  -- not logging this because log gets truncated
      avi.http.remove_cookie(cookie_name)
    end
  end
end
-- get cookies
cookies, count = avi.http.get_cookie_names()
avi.vs.log("cookies_count_after=" .. count)

VMware Cloud Director HTTP error 431: Request Header Fields Too Large

Tomas Fojta wrote previously about issues with Cloud Director errors having to do with excessively large cookies. This is a common problem for cloud providers where there may be multiple web applications, some of which fail to properly limit their cookie scope. At the moment I am writing this, my browser is sending about 6.7kB of cookie data when visiting cloud.ibm.com. This is close to the limit supported by Cloud Director, and sometimes it goes over that limit.

Tomas suggested an approach using the NSX load balancer haproxy configuration to filter cookies. Unfortunately, Tomas’s approach does not cover all possible cases. For example, it does not cover the case where only one of these two cookies is present, and it does not cover the case where there are additional cookies in the header after these two cookies. Furthermore, there are additional cookies used by Cloud Director; at a minimum this includes the following:

  • JSESSIONID
  • rstd
  • vcloud_session_id
  • vcloud_jwt
  • sso-preferred
  • sso_redirect_org
  • *.redirectTo
  • *.state

If you have a known limited list of cookies (or cookie name patterns) like this that you want to pass to your application, it is relatively easy to program a positive cookie filter with an advanced load balancer such as VMware Avi Load Balancer. But if you are using the NSX embedded load balancer and are limited to the haproxy approach of using reqirep with regular expressions, it is an intractable problem. Therefore, instead of using reqirep to selectively include the cookies that Director needs, I recommend the approach of using reqirep to selectively and iteratively delete cookies that you know are likely to be large and to overflow Director’s supported limit. It may take some iterative experimentation over a period of time for you to identify all of the offending cookies.

For example, we can use the following four rules to remove two of the larger cookies for cloud.ibm.com, neither of which are needed by Director. For each cookie I am removing, I have written a pair of rules: the first rule removes the cookie if it appears anywhere other than the end of the cookie list, and the second removes it if it is at the end of the list:

reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.iamcookie\.prod=[^;]*;(.*)$ \1\ \2
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.iamcookie\.prod=[^;]*$ \1
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.Identity\.prod=[^;]*;(.*)$ \1\ \2
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.Identity\.prod=[^;]*$ \1

vCenter key provider server certificates

I’ve written a couple of posts on vCenter key provider client certificates and caveats related to configuring them. In this post I shift to discussing server certificates.

When you connect to a key provider, vCenter only offers you the option of trusting the provider’s end-entity certificate:

Typically an end-entity certificate has a lifetime of a year or less. This means that you will be revisiting the provider configuration to verify the certificate on at least an annual basis.

However, after you have trusted this certificate, vCenter gives you the option of configuring an alternate certificate to be trusted. You can use this to establish trust with one of your key provider’s CA certificates instead of the end-entity certificate. Typically these have longer lifetimes, so your key provider connectivity will be interrupted much less frequently.

You may have to work with your security admin to obtain the CA certificate, or depending on how your key provider is configured you may be able to obtain the certificate directly from the KMIP connection using a tool like openssl:

root@smoonen-vc [ ~ ]# openssl s_client -connect private.eu-de.kms.cloud.ibm.com:5696 -showcerts
CONNECTED(00000003)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root G2
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
verify return:1
depth=0 C = US, ST = New York, L = Armonk, O = International Business Machines Corporation, CN = private.eu-de.kms.cloud.ibm.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = New York, L = Armonk, O = International Business Machines Corporation, CN = private.eu-de.kms.cloud.ibm.com
   i:C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jun 18 00:00:00 2024 GMT; NotAfter: Jun 17 23:59:59 2025 GMT
-----BEGIN CERTIFICATE-----
MIIHWDCCBkCgAwIBAgIQCK1qBW4aHA51Yl6cJVq96TANBgkqhkiG9w0BAQsFADBZ
. . .

You can then paste this certificate directly into the vCenter UI:

After doing this, vCenter will still display the lifetime validity of the end-entity certificate, rather than the CA certificate. But it will now be trusting the CA certificate, and so this trust will extend to the next version of the end-entity certificate, as long as it is signed by the same CA.

vCenter key provider client certificates, part 2

Previously I explained how vCenter creates a new client certificate with each key provider connection. This is a good thing; it enables you to connect vCenter to the same provider multiple times as a different identity, which can be valuable in certain multitenant use cases.

However, there is also a bug in the vCenter UI that generates this certificate. For a split second, the UI presents one certificate, but then switches to a new value. If you click the copy button too quickly, you will copy the wrong certificate:

Be sure to wait for the screen to refresh before copying your certificate!