Customizing root login for VMware Cloud Director

Your Linux VM running in VMware Cloud Director might be preconfigured with the best-practice configuration to disable root password login. This might prevent you from using the root password that you set with Director’s Guest OS Customization:

#PermitRootLogin prohibit-password

You can override this behavior using a Guest OS Customization script in a couple of ways. The simplest approach is to use your customization script to set the sshd configuration to allow root password logins:

#!/bin/bash
sed -ie "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config

Or, if you prefer, you can use the customization script to insert an SSH public key for the root user:

#!/bin/bash
echo "ssh-rsa AAAAB3...DswrcTw==" >> /root/.ssh/authorized_keys
chmod 644 /root/.ssh/authorized_keys

Updated instructions for multipath iSCSI in IBM Cloud

Several years ago I blogged detailed instructions to configure multipath iSCSI in IBM Cloud’s classic infrastructure using Endurance block storage.

Since then I’ve learned that VMware documents you should not use port binding in this topology. I was skeptical of this since I wasn’t confident that an HBA scan would make attempts on all vmkernel ports. However, I’ve retested my instructions without using port binding and I can confirm that I’m able to achieve MPIO connectivity to the storage without it.

I’ve updated the my instructions to remove the port binding step.

VMware encryption: leaked keys and key inventory

VMware vSphere generally leaks keys when objects are deleted or decrypted. The reason for doing so, I believe, is because VMware supposes that you might have a backup copy of the object and may need the key in the future to restore that object. For example, consider the case of a VM that is removed from inventory but remains on disk. VMware cannot know whether you will permanently delete this VM or add it back to inventory. Therefore VMware allows the key to remain in your key provider.

Over time this results in the growth of unused keys in your key provider. In order to clean up unused keys, you first need to inventory the keys that are in use by active objects. The following PowerCLI script uses the VMware.VMEncryption and VMware.VsanEncryption modules in VMware’s PowerCLI community repository. It will inventory all keys in use by your hosts (for core dumps), in use by vSAN clusters (for vSAN disk encryption), and in use by VMs and disks (for vSphere encryption).

$keydata = @()

# Collect host keys
foreach($myhost in Get-VMHost) {
  if($myhost.CryptoSafe) {
    $hostdata = [PSCustomObject]@{
      type        = "host"
      name        = $myhost.Name
      keyprovider = $myhost.KMSserver
      keyid       = $myhost.ExtensionData.Runtime.CryptoKeyId.KeyId
    }
    $keydata += $hostdata
  }
}

# collect vSAN keys
foreach($mycluster in Get-Cluster) {
  $vsanClusterConfig = Get-VsanView -Id "VsanVcClusterConfigSystem-vsan-cluster-config-system"
  $vsanEncryption    = $vsanClusterConfig.VsanClusterGetConfig($mycluster.ExtensionData.MoRef).DataEncryptionConfig

  if($mycluster.vSanEnabled -and $vsanEncryption.EncryptionEnabled) {
    $clusterdata = [PSCustomObject]@{
      type        = "cluster"
      name        = $mycluster.Name
      keyprovider = $vsanEncryption.kmsProviderId.Id
      keyid       = $vsanEncryption.kekId
    }
    $keydata += $clusterdata
  }
}

# collect VM and disk keys
foreach($myvm in Get-VM) {
  if($myvm.encrypted) {
    $vmdata = [PSCustomObject]@{
      type        = "vm"
      name        = $myvm.Name
      keyprovider = $myvm.KMSserver
      keyid       = $myvm.EncryptionKeyId.KeyId
    }
    $keydata += $vmdata
  }

  foreach($mydisk in Get-HardDisk -vm $myvm) {
    if($mydisk.encrypted) {
      $diskdata = [PSCustomObject]@{
        type        = "harddisk"
        name        = $myvm.Name + " | " + $mydisk.Name
        keyprovider = $mydisk.EncryptionKeyId.ProviderId.Id
        keyid       = $mydisk.EncryptionKeyId.KeyId
      }
      $keydata += $diskdata
    }
  }
}

$keydata | Export-CSV -Path keys.csv -NoTypeInformation 

There are some important caveats to note:

  1. This script is over-zealous; it may report that a key is in use multiple times (e.g., host encryption keys shared by multiple hosts, or VM encryption keys shared by the disks of a VM).
  2. Your vCenter may be connected to multiple key key providers. Before deleting any keys, take care that you identify which keys are in use for each key provider.
  3. You may have multiple vCenters connected to the same key provider. Before deleting any keys, take care to collect inventory across all vCenters and any other clients connected to each key provider.
  4. As noted above, you may have VM backups or other resources that are still dependent on an encryption key, even after that resource has been deleted. Before deleting any keys, take care to ensure you have identified which keys may still be in use for your backups.
  5. This script does not address the case of environments using VMware vSphere Trust Authority (vTA).
  6. Importantly, this script does not address the case of “first-class disks,” or what VMware Cloud Director calls “named disks.”

Converting from VUM to vLCM

IBM Cloud vCenter Server (VCS) instances are currently deployed with VUM baselines enabled. After customizing my VCS environment, here is how I switched to vLCM images:

  1. Depending on the vSphere version, it is possible that the drivers provided by the default image may not be at the version level you need. Consult the VMware HCL for your ESXi version and hardware, and identify the driver version you need. Typically for 10GbE you you need i40en, for 25GbE you need icen, and for RAID controller you need lsi_mr3.
  2. Locate the needed driver version for your vSphere release at VMware Customer Connect or work with IBM Cloud support to obtain it. Download the ZIP file and expand the ZIP file to locate the ZIP file inside.
  3. In vCenter, navigate in the main menu to Lifecycle Manager. Select Actions | Import Updates and upload the ZIP file(s) you obtained in step 2 above.
  4. Navigate to vCenter inventory, select your cluster, and select Updates | Image. Then click Setup Image Manually.
  5. In Step 1, choose the vSphere version you desire for your image. Display the details for Components and click Add Components. Change the filter to show “Independent Components and Vendor Addon Components,” then review the drivers you identified earlier in steps 1-2. If the default version differs from the one you need, add it to your image. For your convenience you may want to include vmware-storcli. Then save the image you defined.
  6. In Step 2, after the compliance check completes, review the compliance of each of your hosts and resolve any issues.
  7. Click Finish Image Setup and confirm.
  8. At this point you can remediate your cluster to the new image!

Customizing your VMware on IBM Cloud environment

I like to apply the following customizations to my vCenter Server (VCS) on IBM Cloud instance after deploying it:

SSH customizations

Out of the box, the vCenter customerroot user is not enabled to use the bash shell. After logging in as customerroot, you can self-enable shell access by running:

shell.set --enabled true

Note that in a future release, IBM expects to remove the need for the customerroot user and will provide the root credentials directly to you for newly deployed instances.

Additionally, I like to install my SSH key into vCenter so that I don’t need to provide a password to login. This involves two steps:

  1. Copy my SSH public key to either /root/.ssh/authorized_keys or /home/customerroot/.ssh/authorized_keys. Note that if you create the folder you should set its permissions to 700, and if you create the file you should set its permissions to 600.
  2. vCenter will only allow you to use key-based login if you set your login shell to bash:
    chsh -s /bin/bash

Note that your authorized key will persist across a major release upgrade of vCenter, but your choice of default shell will not. You will have to perform step 2 again after upgrading vCenter to the next major release.

Although SSH is initially disabled on the hosts, I also add my key to each host’s authorized keys list. For ESXi, the file you should edit is /etc/ssh/keys-<username>/authorized_keys as noted in KB 1002866.

Public connectivity

Some of your activities in vCenter benefit from public connectivity. For example, vCenter is able to refresh the vSAN hardware compatibility list proactively.

vCenter supports the use of proxy servers for some of its internet connectivity. Since I have access only to an http but not an https proxy, I configure this by manually editing /etc/sysconfig/proxy as follows:

PROXY_ENABLED="yes"
HTTP_PROXY="http://10.11.12.13:3128/"
HTTPS_PROXY="http://10.11.12.13:3128/"

Alternately, if your instance has public connectivity enabled, you can configure vCenter routes to use your services NSX edge to SNAT to the public network. This involves the following steps:

  1. Login to NSX manager and select Security | Gateway Firewall, then manage the firewall for the T0 gateway with “service” in its name. Add a new policy for “vCenter internet” and add a rule to this policy with the same name and set to allow traffic. The source IP for this rule should be your vCenter appliance IP, and the destination and allowed services can be Any. Publish your changes. Note that these changes may be overwritten later by IBM Cloud automation in some cases if you deploy or remove add-on services like Zerto and Veeam.
  2. Still in NSX manager, select Networking | NAT. Verify that there is already an SNAT configured for the T0 service gateway that allows all 10.0.0.0/8 traffic to SNAT to the public internet.
  3. Identify the NSX edge’s private IP so that we can configure a route to it later. Still in NSX manager, navigate to Networking | Tier-0 Gateways, and expand the gateway with “service” in its name. Click the number next to “HA VIP configuration” and note the IP address associated with the private uplinks, for example, 10.20.21.22.
  4. Login to the vCenter appliance shell (or run appliancesh from the bash prompt). Run the following command to identify the IBM Cloud private router IP address. It will be the Gateway address associated with the 0.0.0.0 destination, for example, 10.30.31.1:
    com.vmware.appliance.version1.networking.routes.list
  5. Now we need to configure three static routes to direct all private network traffic to the private router, substituting the address you learned in step 4 above. IBM Cloud uses the following IP networks on its private network:
    com.vmware.appliance.version1.networking.routes.add --destination 10.0.0.0 --prefix 8 --gateway 10.30.31.1 --interface nic0
    com.vmware.appliance.version1.networking.routes.add --destination 161.26.0.0 --prefix 16 --gateway 10.30.31.1 --interface nic0
    com.vmware.appliance.version1.networking.routes.add --destination 166.8.0.0 --prefix 14 --gateway 10.30.31.1 --interface nic0
  6. Finally we can reconfigure the default gateway. First display the nic0 configuration:
    com.vmware.appliance.version1.networking.ipv4.list
  7. In this configuration we want to modify only the default gateway address. Keeping all the other details we learned from step 6, and substituting the edge private IP address we learned in step 3, run the following command:
    com.vmware.appliance.version1.networking.ipv4.set --interface nic0 --mode static --address 10.1.2.3 --prefix 26 --defaultGateway 10.20.21.22

Note: If you follow the approach of setting up SNAT and customizing routes, in my experience this can cause problems when you upgrade vCenter to the next major release. It appears that the static routes configured in step 5 do not persist across the upgrade, resulting in no traffic being routed to the private network. Before starting a major release upgrade, you should set the vCenter default route that you configured in step 7 back to the IBM Cloud private router. After the release upgrade, you need to reintroduce the three routes you added in step 5 above, as well as updating the default route you set in step 7 to point to the NSX edge.

vSAN configuration

I customize my vSAN configuration as follows:

  1. In vCenter, navigate to the cluster’s Configuration | vSAN | Services and edit the Performance Service; set it to Enabled.
  2. Navigate to the cluster’s Configuration | vSAN | Services and edit the Data Services; enable Data-In-Transit encryption.

Firmware updates

Your host may be provisioned with optional firmware updates pending, and additional firmware updates may be issued by IBM Cloud at any time thereafter. Available firmware updates will be displayed on the Firmware tab of your bare metal server resource in the IBM Cloud console. You can update firmware for a host with the following steps:

  1. In vCenter, place the host in maintenance mode and wait for it to enter successfully.
  2. In the IBM Cloud console, perform Actions | Power off and wait for the host to power off.
  3. In the IBM Cloud console, perform Actions | Update firmware. This action may take several hours to complete.
  4. In vCenter, remove the host from maintenance mode.

Occasionally I have found that either the firmware update fails, or it succeeds but the success is not reflected in the IBM Cloud console and an update still appears to be available. In cases like this you can resolve the issue by opening an IBM Cloud support ticket.

IPMI

At deploy time, your bare metal servers have IPMI interfaces enabled. Although these interfaces are on your dedicated private VLAN, it is still a best practice to disable them to reduce the internal management surface area. You can do this using the SoftLayer CLI and providing the bare metal server ID that is displayed in the server details page in the IBM Cloud console:

slcli hardware toggle-ipmi --disable 1234567
slcli hardware toggle-ipmi --disable 3456789
. . .

Authenticating with the SoftLayer API using IBM Cloud IAM

Traditionally you authenticate with the IBM Cloud SoftLayer “classic infrastructure” API using a SoftLayer or “classic infrastructure” API key. However, IBM Cloud has introduced support to authenticate with these APIs using the standardized IAM API keys and identities. At one point IBM implemented a method to exchange IAM credentials for an IMS token, but IBM’s Martin Smolny writes more recently that the classic APIs now “support IAM tokens directly.”

I’ve written a brief script to demonstrate this approach. The script first calls the IAM token API to exchange an API key for an IAM token. Then it constructs a SoftLayer API client object that leverages this token for authentication. Note that for the Python SDK, some paths which create an API client will use the XMLRPC API endpoint by default instead of using the REST API endpoint. The XMLRPC API does not fully support IAM-based authentication. The method used in this script leverages the REST API endpoint and transport, which does support IAM-based authentication.

VMware NFS resiliency considerations

Here are some important resiliency considerations if you are using NFS datastores for your VMware vSphere cluster. You should be aware of these considerations so that you can evaluate the tradeoffs of your NFS version choice in planning your storage architecture.

NFSv3 considerations

For NFSv3 datastores, ESXi supports storage I/O control (SIOC), which allows you to enable congestion control for your NFS datastore. This helps ensure that your hosts do not overrun the storage array’s IOPS allocation for the datastore. Hosts that detect congestion will adaptively back off the operations they are driving. You should test your congestion thresholds to ensure that they are sufficient to detect and react to problems.

However, NFSv3 does not support multipathing. This is not just a limitation on possible throughput, but a limitation on resiliency. You cannot configure multiple IP addresses for your datastore, and even if your datastore is known by a hostname, ESXi does not allow you to leverage DNS based load balancing to redirect hosts to a new IP address in case of interface maintenance at your storage array; ESXi will not reattempt to resolve the hostname in case of connection failure. Thus, NFSv3 is subject to the possibility that you lose the connection to your datastore in case of interface maintenance on your storage array.

NFSv4.1 considerations

NFSv4.1 datastores have the opposite characteristics for the above issues:

NFSv4.1 supports multipathing, so you are able to configure multiple IP addresses for your datastore connection. This possibly allows you to obtain better network throughput, but more importantly it helps to ensure that your connection to the datastore is resilient in case one of those paths is lost.

However, at this time NFSv4.1 does not support SIOC congestion control. Therefore, if you are using NFSv4.1 you run the risk of triggering a disconnection from your datastore if your host—or especially if multiple hosts—exceeds your storage array’s IOPS allocation for the datastore.

VMware vSAN ESA migration and licensing considerations

With the new vSAN Express Storage Architecture (ESA), you may need to carefully plan your migration path from vSAN 7 to vSAN 8. At the moment, VMware only supports greenfield deployments of vSAN ESA. As a result, even if you have a vSAN cluster with NVMe storage, you will need to migrate your workloads to a new cluster to reach vSAN ESA. Furthermore, if you are moving from SSD to NVMe, you’ll need to ensure your order of operations is correct.

The following graph illustrates your possible migration and upgrade paths:

Your fastest path to ESA is to leave your existing cluster at the vSphere 7 level and create a vSphere 8 ESA cluster after upgrading to vCenter 8.

It’s important to consider both your vSphere and vSAN licensing during this process. For one, you will incur dual licensing for the duration of the migration. But you should also be aware that your vSAN license is tied to your vCenter version rather than your vSphere version. KB 80691 documents the fact that after upgrading to vCenter 8, your vSAN cluster will be operating under an evaluation license until you obtain vSAN 8 licenses. You should work with VMware to ensure both proper vSphere and vSAN licensing throughout this transition process.