Here are some important resiliency considerations if you are using NFS datastores for your VMware vSphere cluster. You should be aware of these considerations so that you can evaluate the tradeoffs of your NFS version choice in planning your storage architecture.
For NFSv3 datastores, ESXi supports storage I/O control (SIOC), which allows you to enable congestion control for your NFS datastore. This helps ensure that your hosts do not overrun the storage array’s IOPS allocation for the datastore. Hosts that detect congestion will adaptively back off the operations they are driving. You should test your congestion thresholds to ensure that they are sufficient to detect and react to problems.
However, NFSv3 does not support multipathing. This is not just a limitation on possible throughput, but a limitation on resiliency. You cannot configure multiple IP addresses for your datastore, and even if your datastore is known by a hostname, ESXi does not allow you to leverage DNS based load balancing to redirect hosts to a new IP address in case of interface maintenance at your storage array; ESXi will not reattempt to resolve the hostname in case of connection failure. Thus, NFSv3 is subject to the possibility that you lose the connection to your datastore in case of interface maintenance on your storage array.
NFSv4.1 datastores have the opposite characteristics for the above issues:
NFSv4.1 supports multipathing, so you are able to configure multiple IP addresses for your datastore connection. This possibly allows you to obtain better network throughput, but more importantly it helps to ensure that your connection to the datastore is resilient in case one of those paths is lost.
However, at this time NFSv4.1 does not support SIOC congestion control. Therefore, if you are using NFSv4.1 you run the risk of triggering a disconnection from your datastore if your host—or especially if multiple hosts—exceeds your storage array’s IOPS allocation for the datastore.