Planning for VMware vSAN ESA

I wrote previously about some considerations for migrating from VMware vSAN Original Storage Architecture (OSA) to Express Storage Architecture (ESA). There are some additional important planning considerations for your hardware choice for vSAN ESA. Even if you are already leveraging NVMe drives using vSAN OSA, your existing hardware may not be supported for ESA. Here are some important considerations:

  • Although OSA was certified on a component level, ESA is certified at the node level using vSAN ESA ReadyNode.
  • These ReadyNode configurations are limited to newer processors.
  • The minimum ReadyNode configuration for compute is 32 cores and 512GB of memory.
  • Although vSAN ESA does not use cache drives, the minimum storage configuration for ESA is four NVMe devices per host. The minimum capacity required for each drive is 1.6TB. At the time of this writing, the largest certified drives are 6.4TB.
  • The minimum network configuration for ESA is 25GbE.
  • The use of TPM 2.0 is recommended
  • With a RAID-5 configuration (erasure coding, FTT=1) you can now deploy as few as three hosts using ESA. All other configurations have the same fixed and recommended minimums as with OSA. As always, with any FTT=1 configuration, you must perform a “full data migration” during host maintenance if you want your storage to remain resilient against host or drive loss during the maintenance window.

Authenticating with the SoftLayer API using IBM Cloud IAM

Traditionally you authenticate with the IBM Cloud SoftLayer “classic infrastructure” API using a SoftLayer or “classic infrastructure” API key. However, IBM Cloud has introduced support to authenticate with these APIs using the standardized IAM API keys and identities. At one point IBM implemented a method to exchange IAM credentials for an IMS token, but IBM’s Martin Smolny writes more recently that the classic APIs now “support IAM tokens directly.”

I’ve written a brief script to demonstrate this approach. The script first calls the IAM token API to exchange an API key for an IAM token. Then it constructs a SoftLayer API client object that leverages this token for authentication. Note that for the Python SDK, some paths which create an API client will use the XMLRPC API endpoint by default instead of using the REST API endpoint. The XMLRPC API does not fully support IAM-based authentication. The method used in this script leverages the REST API endpoint and transport, which does support IAM-based authentication.

VMware NFS resiliency considerations

Here are some important resiliency considerations if you are using NFS datastores for your VMware vSphere cluster. You should be aware of these considerations so that you can evaluate the tradeoffs of your NFS version choice in planning your storage architecture.

NFSv3 considerations

For NFSv3 datastores, ESXi supports storage I/O control (SIOC), which allows you to enable congestion control for your NFS datastore. This helps ensure that your hosts do not overrun the storage array’s IOPS allocation for the datastore. Hosts that detect congestion will adaptively back off the operations they are driving. You should test your congestion thresholds to ensure that they are sufficient to detect and react to problems.

However, NFSv3 does not support multipathing. This is not just a limitation on possible throughput, but a limitation on resiliency. You cannot configure multiple IP addresses for your datastore, and even if your datastore is known by a hostname, ESXi does not allow you to leverage DNS based load balancing to redirect hosts to a new IP address in case of interface maintenance at your storage array; ESXi will not reattempt to resolve the hostname in case of connection failure. Thus, NFSv3 is subject to the possibility that you lose the connection to your datastore in case of interface maintenance on your storage array.

NFSv4.1 considerations

NFSv4.1 datastores have the opposite characteristics for the above issues:

NFSv4.1 supports multipathing, so you are able to configure multiple IP addresses for your datastore connection. This possibly allows you to obtain better network throughput, but more importantly it helps to ensure that your connection to the datastore is resilient in case one of those paths is lost.

However, at this time NFSv4.1 does not support SIOC congestion control. Therefore, if you are using NFSv4.1 you run the risk of triggering a disconnection from your datastore if your host—or especially if multiple hosts—exceeds your storage array’s IOPS allocation for the datastore.

VMware vSAN ESA migration and licensing considerations

With the new vSAN Express Storage Architecture (ESA), you may need to carefully plan your migration path from vSAN 7 to vSAN 8. At the moment, VMware only supports greenfield deployments of vSAN ESA. As a result, even if you have a vSAN cluster with NVMe storage, you will need to migrate your workloads to a new cluster to reach vSAN ESA. Furthermore, if you are moving from SSD to NVMe, you’ll need to ensure your order of operations is correct.

The following graph illustrates your possible migration and upgrade paths:

Your fastest path to ESA is to leave your existing cluster at the vSphere 7 level and create a vSphere 8 ESA cluster after upgrading to vCenter 8.

It’s important to consider both your vSphere and vSAN licensing during this process. For one, you will incur dual licensing for the duration of the migration. But you should also be aware that your vSAN license is tied to your vCenter version rather than your vSphere version. KB 80691 documents the fact that after upgrading to vCenter 8, your vSAN cluster will be operating under an evaluation license until you obtain vSAN 8 licenses. You should work with VMware to ensure both proper vSphere and vSAN licensing throughout this transition process.

VMware’s vExpert program

VMware maintains and supports an evangelism and advocacy program for technologists who have made demonstrated contributions to the VMware community, whom they call vExperts. VMware makes a significant investment in the vExpert program, providing opportunities such as webinars and evaluation licenses. vExperts are appointed for an annual term, and reappointments require demonstrated merit. I’ve been appointed as a vExpert now for three years, and I’m very honored to have received this recognition.

If you’re actively involved in the VMware user community, you should apply! Here are a few testimonials to encourage you:

Updated VMware Solutions API samples

It’s been awhile since I first posted sample IBM Cloud for VMware Solutions API calls. Since then, our offering has moved from NSX–V to NSX–T, and to vSphere 7.0. This results in some changes to the structure of the API calls you need to make for ordering instances, clusters, and hosts.

I’ve updated the sample ordering calls on Github. This includes the following changes:

  • Migrate some of the utility APIs to version 2
  • Send transaction ids with each request to aid in problem diagnosis
  • Order NSX–T and vSphere 7.0 instead of NSX–V and vSphere 6.7
  • Restructure some of the ordering code to order a concrete example instead of a random one
  • Use the new price check parameter to obtain price estimates
  • Ensure that the add–cluster path leverages an existing cluster’s location and VLANs

Highly available key management in IBM Cloud for VMware Solutions

IBM Cloud’s KMIP for VMware offering provides the foundation for cloud-based key management when using VMware vSphere encryption or vSAN encryption. KMIP for VMware is highly available within a single region:

  • KMIP for VMware and Key Protect are highly available when you configure vCenter connections to both regional endpoints. If any one of the three zones in that region fail entirely, key management continues to be available to your VMware workloads.
  • KMIP for VMware and Hyper Protect Crypto Services (HPCS) are highly available if you deploy two or more crypto units for your HPCS instance. If you do so and any one of the three zones in that region fail entirely, key management continues to be available to your VMware workloads.

If you need to migrate or failover your workloads outside of a region, your plan depends on whether you are using vSAN encryption or vSphere encryption:

When you are using vSAN encryption, each site is protected by its own key provider. If you are using vSAN encryption to protect workloads that you replicate between multiple sites, you must create separate KMIP for VMware instances in each site, that are connected to separate Key Protect or HPCS instances in those sites. You must connect your vCenter Server in each site to the local KMIP for VMware instance as its key provider.

When you are using vSphere encryption, most VMware replication and migration techniques today (for example, cross-vCenter vMotion and vSphere replication) rely on having a common key manager between the two sites. This topology is not supported by KMIP for VMware. Instead, you must create separate KMIP for VMware instances in each site, that is connected to separate Key Protect or HPCS instances in those sites. You must connect your vCenter server in each site to the local KMIP for VMware instance as its key provider, and then use a replication technology that supports the attachment and replication of decrypted disks.

Veeam Backup and Replication supports this replication technique. To implement this technique correctly, see the steps that you must take as indicated in the Veeam documentation.

Note that this technique currently does not support the replication of virtual machines with a vTPM device.