Here are some important resiliency considerations if you are using NFS datastores for your VMware vSphere cluster. You should be aware of these considerations so that you can evaluate the tradeoffs of your NFS version choice in planning your storage architecture.
For NFSv3 datastores, ESXi supports storage I/O control (SIOC), which allows you to enable congestion control for your NFS datastore. This helps ensure that your hosts do not overrun the storage array’s IOPS allocation for the datastore. Hosts that detect congestion will adaptively back off the operations they are driving. You should test your congestion thresholds to ensure that they are sufficient to detect and react to problems.
However, NFSv3 does not support multipathing. This is not just a limitation on possible throughput, but a limitation on resiliency. You cannot configure multiple IP addresses for your datastore, and even if your datastore is known by a hostname, ESXi does not allow you to leverage DNS based load balancing to redirect hosts to a new IP address in case of interface maintenance at your storage array; ESXi will not reattempt to resolve the hostname in case of connection failure. Thus, NFSv3 is subject to the possibility that you lose the connection to your datastore in case of interface maintenance on your storage array.
NFSv4.1 datastores have the opposite characteristics for the above issues:
NFSv4.1 supports multipathing, so you are able to configure multiple IP addresses for your datastore connection. This possibly allows you to obtain better network throughput, but more importantly it helps to ensure that your connection to the datastore is resilient in case one of those paths is lost.
However, at this time NFSv4.1 does not support SIOC congestion control. Therefore, if you are using NFSv4.1 you run the risk of triggering a disconnection from your datastore if your host—or especially if multiple hosts—exceeds your storage array’s IOPS allocation for the datastore.
With the new vSAN Express Storage Architecture (ESA), you may need to carefully plan your migration path from vSAN 7 to vSAN 8. At the moment, VMware only supports greenfield deployments of vSAN ESA. As a result, even if you have a vSAN cluster with NVMe storage, you will need to migrate your workloads to a new cluster to reach vSAN ESA. Furthermore, if you are moving from SSD to NVMe, you’ll need to ensure your order of operations is correct.
The following graph illustrates your possible migration and upgrade paths:
Your fastest path to ESA is to leave your existing cluster at the vSphere 7 level and create a vSphere 8 ESA cluster after upgrading to vCenter 8.
It’s important to consider both your vSphere and vSAN licensing during this process. For one, you will incur dual licensing for the duration of the migration. But you should also be aware that your vSAN license is tied to your vCenter version rather than your vSphere version. KB 80691 documents the fact that after upgrading to vCenter 8, your vSAN cluster will be operating under an evaluation license until you obtain vSAN 8 licenses. You should work with VMware to ensure both proper vSphere and vSAN licensing throughout this transition process.
VMware maintains and supports an evangelism and advocacy program for technologists who have made demonstrated contributions to the VMware community, whom they call vExperts. VMware makes a significant investment in the vExpert program, providing opportunities such as webinars and evaluation licenses. vExperts are appointed for an annual term, and reappointments require demonstrated merit. I’ve been appointed as a vExpert now for three years, and I’m very honored to have received this recognition.
If you’re actively involved in the VMware user community, you should apply! Here are a few testimonials to encourage you:
It’s been awhile since I first posted sample IBM Cloud for VMware Solutions API calls. Since then, our offering has moved from NSX–V to NSX–T, and to vSphere 7.0. This results in some changes to the structure of the API calls you need to make for ordering instances, clusters, and hosts.
IBM Cloud’s KMIP for VMware offering provides the foundation for cloud-based key management when using VMware vSphere encryption or vSAN encryption. KMIP for VMware is highly available within a single region:
KMIP for VMware and Key Protect are highly available when you configure vCenter connections to both regional endpoints. If any one of the three zones in that region fail entirely, key management continues to be available to your VMware workloads.
KMIP for VMware and Hyper Protect Crypto Services (HPCS) are highly available if you deploy two or more crypto units for your HPCS instance. If you do so and any one of the three zones in that region fail entirely, key management continues to be available to your VMware workloads.
If you need to migrate or failover your workloads outside of a region, your plan depends on whether you are using vSAN encryption or vSphere encryption:
When you are using vSAN encryption, each site is protected by its own key provider. If you are using vSAN encryption to protect workloads that you replicate between multiple sites, you must create separate KMIP for VMware instances in each site, that are connected to separate Key Protect or HPCS instances in those sites. You must connect your vCenter Server in each site to the local KMIP for VMware instance as its key provider.
When you are using vSphere encryption, most VMware replication and migration techniques today (for example, cross-vCenter vMotion and vSphere replication) rely on having a common key manager between the two sites. This topology is not supported by KMIP for VMware. Instead, you must create separate KMIP for VMware instances in each site, that is connected to separate Key Protect or HPCS instances in those sites. You must connect your vCenter server in each site to the local KMIP for VMware instance as its key provider, and then use a replication technology that supports the attachment and replication of decrypted disks.
Veeam Backup and Replication supports this replication technique. To implement this technique correctly, see the steps that you must take as indicated in the Veeam documentation.
Note that this technique currently does not support the replication of virtual machines with a vTPM device.
I think of my job often as being a translator between executives, managers, architects, developers, testers, customers, writers, etc. My favorite work projects have been those we conducted war-room style or in an open landscape, yet now it is almost two years since I’ve been in the office. We’ve filled in the gap a little bit with some team outings. Today I went in to the office to collect my belongings, before my vaxx-leper status kicks in and my physical access is deactivated. This is such a stark contrast with my experience at church where we worked hard to find some way to meet, at times even with our fussy government’s disapproval. What a joy and encouragement that fellowship was, and what a missed opportunity these two years have been for camaraderie at so many anxious companies and churches!