Updated VMware Solutions API samples

It’s been awhile since I first posted sample IBM Cloud for VMware Solutions API calls. Since then, our offering has moved from NSX–V to NSX–T, and to vSphere 7.0. This results in some changes to the structure of the API calls you need to make for ordering instances, clusters, and hosts.

I’ve updated the sample ordering calls on Github. This includes the following changes:

  • Migrate some of the utility APIs to version 2
  • Send transaction ids with each request to aid in problem diagnosis
  • Order NSX–T and vSphere 7.0 instead of NSX–V and vSphere 6.7
  • Restructure some of the ordering code to order a concrete example instead of a random one
  • Use the new price check parameter to obtain price estimates
  • Ensure that the add–cluster path leverages an existing cluster’s location and VLANs

Active Directory and SSO integration for VMware Solutions in IBM Cloud

VMware Solutions instances in IBM Cloud are deployed with a built-in Active Directory domain with one or two directory controllers. Recently IBM Cloud changed the domain name requirements to require three qualifiers (e.g., cloud.example.com) rather than two (e.g., example.com). The reason for this is that we want to ensure you can integrate with your existing domain and forest without experiencing conflict. The domain controllers are configured as SSO provider for vCenter and NSX, and also as DNS provider for the infrastructure components. IBM Cloud creates an administrator userid in this domain which it uses for subsequent operations, such as logging into vCenter to add a new host, updating DNS records for that host, and creating utility accounts for add-on services like Veeam.

This Active Directory domain is your responsibility to secure and manage, including backup, patching, group policy, etc.

In order of integration from loosest to tightest coupling:

1. No integration

You are free to leverage your instance domain directly for user management within the instance. You can point additional components to the instance’s domain controllers for SSO; for example, the IBM Cloud automation does this for you when it deploys and configures HyTrust Cloud Control. You can join other devices to the domain and also use this for DNS management beyond the instance infrastructure.

2. Additional SSO provider

This option and all of the following options each entail some kind of integration with your instance and your existing Active Directory forest. You will first need to establish network connectivity between your instance and your existing Active Directory forest. You might accomplish this with either a VPN connection or a direct link between IBM Cloud and your on-premises environment. As always, you should take great care to secure your domain controllers, so you should explore security measures such as the use of read-only directory controllers, session recording, bastion servers, and gateway firewalls.

You can leverage your own Active Directory domain for SSO purposes by configuring your directory controllers as additional SSO providers for vCenter and NSX manager and by granting your users and groups appropriate permissions. You will need to determine how you configure DNS; some customers manually duplicate the DNS records from their instance domain into their existing Active Directory domain, but it is also possible to establish mutual DNS delegation between the two Active Directory domains.

This approach may allow you to limit the cloud connections to your directory controllers so that you are only opening up LDAPS and DNS ports.

3. One-way trust

You can establish one-way trust from your instance’s Active Directory domain controllers to your existing Active Directory domain. This will enable you to expose and authorize your existing users and groups to vCenter and NSX manager without having to add these directly as SSO providers. You may need to make additional provision for DNS updates, either copying them to your existing domain or establishing DNS delegation to the instance’s domain.

4. Two-way trust

This option requires your existing domain to establish mutual trust with your instance’s domain. If you are comfortable doing this, it could simplify your DNS management between the two domains.

5. Forest merge

I am not aware of any IBM Cloud customers who have done this, and I do not recommend it since it is a disruptive and potentially risky operation. The idea here is to merge the instance’s forest with your existing forest and to configure the instance’s domain as a child domain of your existing domain.

6. Rebuild

IBM Cloud’s VMware Solutions Shared offering implements a variation of the forest merge. It deploys VCS instances and builds VMware Cloud Director environments on top of them. This solution leverages an existing internal Active Directory forest and domain. After each new VCS instance is deployed, our process removes the VCS instance from its domain and reconfigures it to point to the existing domain.

A variation of this option is to create a new child domain in your existing forest for your VCS instance, and leverage the controllers for this child domain for use with your VCS instance.

There are a few important points to observe:

  1. You should either deploy your instance with the same domain name that you intend to convert it to, or else you should accept the fact that your infrastructure components will have host names in a different DNS domain from your Active Directory domain. Changing the DNS domain of infrastructure components is not supported by IBM Cloud automation.
  2. You will need to re-create the IBM Cloud automation user in your existing domain as an administrator and ensure that this user has administrative permissions in vCenter and NSX manager. This user may in the future create additional users or DNS entries. After performing the reconfiguration, you should open a support ticket to the VMware Solutions team asking them to update the automation user’s password in the IBM Cloud database for your instance, and provide the updated password.

Because this process is complex it is error prone, and you should consider this option only if the options above do not work for you. Additionally, you should practice this with a non-production or pre-production VCS deployment, including the test of adding a new host to the environment, before you implement it in production.

Connecting VMware Cloud Director with IBM Cloud VPC

IBM Cloud offers IBM–managed VMware Cloud Director through its VMware Solutions Shared offering. This offering is currently available in IBM Cloud’s Dallas and Frankfurt multi-zone regions, enabling you to deploy VMware virtual machines across three availability zones in those regions.

IBM Cloud also offers a virtual private cloud (VPC) for deployment of virtual machine and container workloads. Although VMware Cloud Director is operated in IBM Cloud’s “classic infrastructure,” it is still possible to interconnect your Cloud Director workload with your VPC workload using private network endpoints (PNEs) that are visible to your VPC.

In this article we’ll discuss how to implement this solution. This solution allows for bidirectional connectivity, but for illustrative purposes consider the use case of hosting an application in IBM Cloud VPC and a database in VMware Cloud Director:

Reviewing this topology from the top down:

  • Incoming traffic is handled by an IBM Cloud Load Balancer
  • The load balancer distributes connections to applications running on virtual server instances (VSIs) in our example, or optionally to kubernetes services. The application is deployed in two zones for high availability.
  • Each zone in the VPC has a router that will tunnel traffic to and from Cloud Director using BGP over IPsec. For the purposes of this exercise we used a RedHat Enterprise Linux 8 VSI, but you could deploy virtual gateway appliances from a vendor of your choice.
  • The VPC routers connect over the private IBM Cloud network through private network endpoints (PNEs) to edge appliances in Cloud Director.
  • The Cloud Director workload is distributed across three virtual datacenters (VDCs), one in each availability zone. Two edge services gateways (ESGs), one in each of two zones, serve as the ingress and egress points. These operate in active–standby state so that a stateful firewall can be used.
  • The database is deployed across three zones for high availability.

Caveats

The solution described here uses the IBM Cloud private network. This is a nice feature of the solution, but for reasons that may not be initially obvious, it is also required at the moment. If you wish to connect a single availability zone between VCD and VPC, you could do so using a public VPN connection between your VCD edge and the IBM Cloud VPC VPN gateway service. However, the VPC VPN service currently does not support BGP peering, so it is not possible to create a highly available connection that is able to failover to a different VCD edge endpoint.

Also, the solution outlined here deploys only a single router device in each VPC zone. For high availability, you likely want to deploy multiple virtual router appliances, and for routing purposes share a virtual IP address which you reserve in your VPC subnet. At this time, IBM Cloud VPC does not support multicast or protocols other than ICMP, TCP, and UDP. These limitations exclude protocols like HSRP and VRRP; you should ensure that your router’s approach to HA is able to operate using unicast ICMP, TCP, or UDP.

Deploy your VPC resources

Create a VPC in Dallas or Frankfurt. The VPC will automatically generate address prefixes and subnets for you; I recommend you de-select “Create a default prefix for each zone” so that you can choose your own later:

Next, navigate to your VPC and create address prefixes of your choice:

In order to create subnets, you must navigate away from the VPC to the subnet page. In our case, since we are hosting workloads in only two zones, we had a need only for two subnets:

Next, create four virtual server instances (VSIs), two in each zone. Within each zone, one VSI will serve as the application and the other will serve as a virtual router. For the purposes of this example we use RedHat Enterprise Linux 8.

You need to modify the router VSI network interfaces, either when you create it or afterwards, to enable IP spoofing. This will allow the routers to route traffic other than their own IP address:

Be sure to update the operating system packages and reboot each VSI.

Finally, create an IBM Cloud load balancer instance pointing to each of your application VSIs. Because this is a multi-zone load balancer you must use the DNS-based application load balancer:

Deploy your Cloud Director resources

Next create three VMware Solutions Shared virtual data centers (VDCs). Note that while VPC availability zones are named 1, 2, and 3, VDC availability zones are named according to the IBM Cloud classic infrastructure data center names. Thus, we will deploy to Dallas 10, 12, and 13, which correspond to the three VDC zones:

After creating your three virtual data centers, you need to view any one of these VDCs and reset the administrator password to gain access to the single Cloud Director organization for your account. Using this administrator account you can create additional users and optionally integrate with your own SSO provider:

Next, use these credentials to login to the Cloud Director console. We will create a Data Center Group and assign all three of our VDCs to it so that they have a shared stretch network and network egress. Navigate to Data Centers | Data Center Groups and create a new data center group. Ensure that you select the “Create Local Group” option; although the VDCs are actually in different availability zones, they are designated in the same fault domain from a Cloud Director perspective and we will use active-standby routing. There is only one network pool available for you to use:

After creating the data center group, create a stretched network that will be shared by all three VDCs:

Add your DAL10 edge as the active egress point, and your DAL12 edge as the passive egress point:

Next, navigate to each of your VDCs, view the stretched network, and create an IP pool for each VDC that is a subset of your stretched network:

Next, configure your DAL10 and DAL12 edges (see IBM Cloud docs for details) to allow and to SNAT egress traffic from your VPC to the IBM Cloud service network (e.g., for DNS and RedHat Satellite) and to the public network. If you wish to DNAT traffic from the public internet to reach your virtual machines, keep in mind that the DAL10 edge is the active edge and you should not use DAL12 for ingress except in case of DAL10 failure.

Minimally you want your workload to reach the IBM private service network which includes 52.117.132.0/24 and 161.26.0.0/16. Because we are using private network endpoints (PNEs) you also need to permit 166.9.0.0/15; this address range is also used by any other IBM Cloud services offering private endpoints. For this example I simply configured the edge firewalls to permit all outbound traffic to both private and public:

You must configure an SNAT rule for the private service network (note that this rule is created on the service interface):

and, if needed, an SNAT rule for the public network (note that this rule is created on the external interface):

Next, create the virtual machines that will serve as your database, one in each VDC. For the purposes of this example, we deployed RHEL 8 virtual machines from the provided templates and connected them to IBM Cloud’s Satellite server following the directions in the /etc/motd file. There are a few caveats to the deployment:

  • You should connect the virtual machine interfaces to the stretched network before starting them so that the network customization configures their IP address. Choose an IP address from the pool you created earlier.
  • At first power-on, you should “power on and force recustomization;” afterwards you can view the root password from the customization properties.
  • When using a stretched network, customization does not set the DNS settings for your virtual machines. For RHEL we entered the IBM Cloud DNS servers into /etc/sysconfig/network-scripts/ifcfg-ens192 as follows:
DNS1=161.26.0.10
DNS2=161.26.0.11

Configure BGP over IPsec connectivity between VCD and VPC

In order to expose your Cloud Director edges to your VPC using the IBM Cloud private network, you must create private network endpoints (PNEs) for your DAL10 and DAL12 VDCs. First, in the IBM Cloud console, view your VPC details. A panel on that page lists the “Cloud Service Endpoint service addresses” which are addresses not visible to your VPC but which are the addresses representing your VPC that you will need to permit to access your PNEs. Take note of these addresses:

Now, navigate to your DAL10 and DAL12 VDCs in the IBM Cloud console and click “Create a private network endpoint.” Select the device type of your choice and enter the IP addresses you noted above:

The PNE may take some time to create as it is an operator assisted activity. After it has been successfully created, you will need to create a second PNE in each of the two zones. The reason we need to create a second PNE is that the PNE hides the source IP address of incoming connections, so we cannot configure policies for two different IPsec tunnels using the same PNE. The IBM Cloud console does not allow you to create a second PNE automatically, so you must open a support ticket to the VMware Solutions team. Phrase your ticket as follows:

Hi, I have already created a PNE for my VCD edges edge-dal10-xxxxxxxx and edge-dal12-yyyyyyyy. Please create a second service IP for each of these edges with an additional PNE for each edge. Please use the same whitelist for the existing PNEs. Thank you!

Note that in our example we are connecting only Dallas 1 and Dallas 2 zones from our VPC to Cloud Director. If you wanted to connect Dallas 3 as well, you would need to request three rather than two PNEs for each of your DAL10 and DAL12 edges.

Now we need to configure each of our two NSX edges and our two VPC routers to have dual BGP over IPsec connections to their peers. You need to select which PNE will be used for each VPC router connection.

On the VCD side, the IPsec VPN site configuration for one of the VPC routers looks as follows. In this case, the 52.x address is the PNE’s “service network IP” and the 166.x address is the PNE’s “private network IP:”

And the corresponding BGP configuration is as follows:

Finally, you must be sure to permit the VCD and VPC interconnectivity in both edge firewalls:

For the purposes of this example we are using RHEL8 VSIs as simple routers on the VPC side. First of all, we need to modify /etc/sysctl.conf to allow IP forwarding:

net.ipv4.ip_forward = 1

And then turn it on dynamically:

[root@smoonen-router1 ~]# echo 1 >/proc/sys/net/ipv4/ip_forward
[root@smoonen-router1 ~]#

Next we installed the libreswan package for IKE/IPsec support, and the frr package for BGP support.

In order to use dynamic routing, the IPsec tunnel must be configured using a virtual tunnel interface (VTI). The IPsec configuration for our Dallas 1 router is as follows. The left and leftid values are the address and identity of the router appliance itself. The right value has been obscured; it reflects the address of the VCD edge as known to the router; this is the PNE’s “private network IP.” The rightid value has also been obscured; it reflects the identity of the VCD edge, which we have previously set to the PNE’s “service network IP:”

# Connection to ESG1
conn routed-vpn-esg1
    left=192.168.1.4
    leftid=192.168.1.4
    right=166.9.xx.xx
    rightid=52.117.xx.xx
    authby=secret
    leftsubnet=0.0.0.0/0
    rightsubnet=0.0.0.0/0
    leftvti=10.10.10.1/30
    auto=start
    ikev2=insist
    ike=aes128-sha256;modp2048
    mark=5/0xffffffff
    vti-interface=vti01
    vti-shared=no
    vti-routing=no

# Connection to ESG2
conn routed-vpn-esg2
    left=192.168.1.4
    leftid=192.168.1.4
    right=166.9.yy.yy
    rightid=52.117.yy.yy
    priority=2000
    authby=secret
    leftsubnet=0.0.0.0/0
    rightsubnet=0.0.0.0/0
    leftvti=10.10.10.5/30
    auto=start
    ikev2=insist
    ike=aes128-sha256;modp2048
    mark=6/0xffffffff
    vti-interface=vti02
    vti-shared=no
    vti-routing=no

Note that the tunnels use a different mark and VTI interface. Next, in /etc/frr/daemons, enable bgpd:

bgpd=yes

Then define your tunnel interfaces in /etc/frr/zebra.conf; these are the interfaces for our Dallas 1 router:

!
interface vti1
ip address 10.10.10.1/30
ipv6 nd suppress-ra
!
interface vti2
ip address 10.10.10.5/30
ipv6 nd suppress-ra

Finally, configure BGP in /etc/frr/bgpd.conf:

hostname smoonen-router1
router bgp 64555
 bgp router-id 10.10.10.1
  network 10.10.10.0/30
  network 10.10.10.4/30
  network 192.168.1.0/24
  neighbor 10.10.10.2 remote-as 65010
  neighbor 10.10.10.2 route-map RMAP-IN in
  neighbor 10.10.10.2 route-map RMAP-OUT out
  neighbor 10.10.10.2 soft-reconfiguration inbound
  neighbor 10.10.10.2 weight 2
  neighbor 10.10.10.6 remote-as 65010
  neighbor 10.10.10.6 route-map RMAP-IN in
  neighbor 10.10.10.6 route-map RMAP-OUT out
  neighbor 10.10.10.6 soft-reconfiguration inbound
  neighbor 10.10.10.6 weight 1

ip prefix-list PRFX-VCD seq 5 permit 172.16.0.0/12 le 32
ip prefix-list PRFX-VPC seq 5 permit 192.168.0.0/16 le 32

route-map RMAP-IN permit 10
 match ip address prefix-list PRFX-VCD
route-map RMAP-OUT permit 10
 match ip address prefix-list PRFX-VPC

log file /var/log/frr/bgpd.log debug

Taken together, we have configured:

  • Cloud Director to use DAL10 as active and DAL12 as standby
  • Cloud Director edges will advertise the entire stretch network (172.16.1.0/24) to the VPC routers
  • Each VPC router is configured to prefer the DAL10 edge
  • Each VPC router will advertise its own zone (192.168.1.0/24 or 192.168.2.0/24) to the Cloud Director edges

Now enable IPsec and FRR:

systemctl start ipsec
systemctl enable ipsec
ipsec auto --add routed-vpn-esg1
ipsec auto --add routed-vpn-esg2
ipsec auto --up routed-vpn-esg1
ipsec auto --up routed-vpn-esg2

chown frr:frr /etc/frr/bgpd.conf
chown frr:frr /etc/frr/staticd.conf
systemctl start frr
systemctl enable frr

Finally, you need to visit the IBM Cloud console and find the route table configuration for your VPC:

Modify the route table configuration to direct the VCD networks to your router VSI in each zone. Remember that for this example we are hosting applications only in two zones:

After the tunnel is up and the initial BGP exchange complete, you should have bidirectional connectivity between both environments. Here is a ping from one of our application VSIs:

[root@smoonen-application1 ~]# ping -c 3 -I 192.168.1.5 172.16.1.10
PING 172.16.1.10 (172.16.1.10) from 192.168.1.5 : 56(84) bytes of data.
64 bytes from 172.16.1.10: icmp_seq=1 ttl=61 time=3.21 ms
64 bytes from 172.16.1.10: icmp_seq=2 ttl=61 time=2.34 ms
64 bytes from 172.16.1.10: icmp_seq=3 ttl=61 time=2.87 ms

--- 172.16.1.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 2.344/2.809/3.210/0.356 ms
[root@smoonen-application1 ~]#

We have not tuned BGP, but in spite of this, if we disable BGP on the DAL10 edge (this effectively severs both its connection to the stretched network and its connection to VPC), we see that the connectivity from the VPC fails over to the DAL12 edge:

64 bytes from 172.16.1.10: icmp_seq=16 ttl=61 time=2.51 ms
64 bytes from 172.16.1.10: icmp_seq=17 ttl=61 time=16.9 ms
64 bytes from 172.16.1.10: icmp_seq=18 ttl=61 time=2.63 ms
64 bytes from 172.16.1.10: icmp_seq=137 ttl=61 time=8.52 ms
64 bytes from 172.16.1.10: icmp_seq=138 ttl=61 time=6.06 ms
64 bytes from 172.16.1.10: icmp_seq=139 ttl=61 time=5.07 ms

Conclusion

We have successfully established bidirectional connectivity over the IBM Cloud private network between VMware Cloud Director and IBM Cloud VPC using BGP over IPsec.

As described above, it is possible to extend this solution by deploying a router appliance in the third VPC availability zone, in which case you would need to deploy two more PNEs, one for each of your VCD edges. Also, you will need additional PNEs if you deploy more than one router appliance into each zone for HA. Thus, you could require up to twelve PNEs (two router appliances in each of three zones, each of which has a connection to two VCD edges).

Many thanks to Mike Wiles and Jim Robbins for their assistance in developing this solution.

Managing Veeam backup encryption using IBM Cloud key management

Veeam Backup and Replication offers the ability to encrypt your backups using passwords, which function as a kind of envelope encryption key for the encryption keys protecting the actual data. Veeam works hard to protect these passwords from exposure, to the degree that Veeam support cannot recover your passwords. You can ensure the resiliency of these keys either with a password–encrypted backup of your Veeam configuration; or by using Veeam Backup Enterprise Manager, which can protect and recover these passwords using an asymmetric key pair managed by Enterprise Manager. However, neither of these offerings allows integration with an external key manager for key storage and lifecycle. As a result, you must implement automation if you want to achieve Veeam backup encryption without your administrators and operators having direct knowledge of your encryption passwords. Veeam provides a set of PowerShell encryption cmdlets for this purpose.

In this article, I will demonstrate how you can use IBM Cloud Key Protect or IBM Cloud Hyper Protect Crypto Services (HPCS) to create and manage your Veeam encryption passwords.

Authenticating with the IBM Cloud API

Our first step is to use an IBM Cloud service ID API key to authenticate with IBM Cloud IAM and obtain a limited–time token that we will provide as our authorization for Key Protect or HPCS APIs. For this purpose we will use IBM Cloud’s recently released private endpoint for the IAM token service, which allows us to avoid connection to the public internet provided we have enabled VRF and service endpoints in our account.

# Variables

$apikey = '...'

# URIs and script level settings

$tokenURI = 'https://private.iam.cloud.ibm.com/identity/token'
$ErrorActionPreference = 'Stop'
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

# Exchange IBM Cloud API key for token

$headers = @{Accept='application/json'}
$body = @{grant_type='urn:ibm:params:oauth:grant-type:apikey'; apikey=$apikey}
$tokenResponse = Invoke-RestMethod -Uri $tokenURI -Method POST -Body $body -Headers $headers

# Bearer token is now present in $tokenResponse.access_token

This token will be used in each of the following use cases.

Generating a password

In order to generate a new password for use with Veeam, we will use this token to call the Key Protect or HPCS API to generate an AES256 key and “wrap” (that is, encrypt) it with a root key. The service ID associated with our API key above needs Reader access to the Key Protect or HPCS instance to perform this operation. The following example uses the Key Protect private API endpoint; if you are using HPCS you will have a private API endpoint specific to your instance that looks something like https://api.private.us-south.hs-crypto.cloud.ibm.com:12345. In this script we use a pre–selected Key Protect or HPCS instance (identified by $kms) and root key within that instance (identified by $crk).

# Variables

$kms = 'nnnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn'
$crk = 'nnnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn'

# URIs and script level settings

$kmsURIbase = 'https://private.us-south.kms.cloud.ibm.com/api/v2/keys/'
$ErrorActionPreference = 'Stop'
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

# Perform wrap operation with empty payload to generate an AES 256 key that will be used as password

$headers = @{Accept='application/json'; 'content-type'='application/vnd.ibm.kms.key_action_wrap+json'; 'bluemix-instance'=$kms; Authorization=("Bearer " + $tokenResponse.access_token); 'correlation-id'=[guid]::NewGuid()}
$body = @{}
$wrapResponse = Invoke-RestMethod -Uri ($kmsURIbase + $crk + "/actions/wrap") -Method POST -Body (ConvertTo-Json $body) -Headers $headers

# Plaintext key is present in $wrapResponse.plaintext, and wrapped key in $wrapResponse.ciphertext

After generating the key, we create a new Veeam password with that content. The output of the wrap operation includes both the plaintext key itself and also the wrapped form of the key. Our password can only be extracted from this wrapped ciphertext by someone who has sufficient access to the root key. We should store this wrapped form somewhere for recovery purposes; for the purposes of this example I am storing it as the password description together with a name for the password, $moniker, which in the full script is collected earlier from the script parameters.

$plaintext = ConvertTo-SecureString $wrapResponse.plaintext -AsPlainText -Force
$wdek = $wrapResponse.ciphertext
Remove-Variable wrapResponse

# Store this key as a new Veeam encryption key. Retain it in base64 format for simplicity.

Add-VBREncryptionKey -Password $plaintext -Description ($moniker + " | " + $wdek)

Write-Output ("Created new key " + $moniker)

You can see the full example script create-key.ps1 in GitHub.

Re–wrap a password

Because Veeam does not directly integrate with an external key manager, we have extra work to do if we want to respond to rotation of the root key, or to cryptographic erasure. The following code uses the rewrap API call to regenerate the wrapped form of our key in case the root key has been rotated. This ensures that our backup copy of the key is protected by the latest version of the root key.

# Perform rewrap operation to rewrap our key
# If this operation fails, it is possible your root key has been revoked and you should destroy the Veeam key

$headers = @{Accept='application/json'; 'content-type'='application/vnd.ibm.kms.key_action_rewrap+json'; 'bluemix-instance'=$kms; Authorization=("Bearer " + $tokenResponse.access_token); 'correlation-id'=[guid]::NewGuid()}
$body = @{ciphertext=$wdek}
$rewrapResponse = Invoke-RestMethod -Uri ($kmsURIbase + $crk + "/actions/rewrap") -Method POST -Body (ConvertTo-Json $body) -Headers $headers

Note that this API call will fail with a 4xx error in cases that include the revocation of the root key. In this case, if the root key has been purposely revoked, it is appropriate for you to remove your Veeam password to accomplish the cryptographic erasure. However, assuming that the rewrap is successful, we should update our saved copy of the wrapped form of the key to this latest value. In this example, $key is a PSCryptoKey object that was earlier collected from the Get-VBREncryptionKey cmdlet, and represents the key whose description will be updated:

$newWdek = $rewrapResponse.ciphertext
Remove-Variable rewrapResponse

# Update the existing description of the Veeam encryption key to reflect the updated wrapped version

Set-VBREncryptionKey -EncryptionKey $key.Description -Description ($moniker + " | " + $newWdek)

Write-Output ("Rewrapped key " + $moniker)

You can see the full example script rewrap-key.ps1 in GitHub.

Recover a password

Within a single site the above approach is sufficient. For additional resilience, you can use Veeam backup copy jobs to copy your data to a remote location. If you have a Veeam repository in a remote site and you lose the VBR instance and repositories in your primary site, Veeam enables you to recover VBR in the remote site from an encrypted configuration backup, after which you can restore backups from the repository in that site.

However, you need to plan carefully for recovery not only of your data but also your encryption keys. Ideally, you would choose to protect both the Veeam configuration backup and the VM backups using keys that are protected by IBM Cloud Key Protect or HPCS. This means that for configuration backups and for remote backups, you should choose a Key Protect or HPCS key manager instance in the remote location so that your key management in the remote site is not subject to the original site failure. You might therefore be using two key manager instances: one local key manager instance for keys to protect your local backup jobs used for common recovery operations, and another remote instance for keys to protect your configuration backup and your copy backup jobs used in case of disaster.

This also implies that the key used to protect your configuration backups should be preserved in an additional location than your VBR instance and in a form other than a Veeam key object; in fact, the Veeam configuration restore process requires you to enter the password–key manually. You should store the key in its secure wrapped form, ideally alongside your Veeam configuration backup. You will then need to unwrap the key when you restore the configuration. In this example, the wrapped form of the key is expected to be one of the script arguments, and this underscores the need to protect this key with a key manager that will still be available in case of the original site failure:

# Perform unwrap operation

$headers = @{Accept='application/json'; 'content-type'='application/vnd.ibm.kms.key_action_unwrap+json'; 'bluemix-instance'=$kms; Authorization=("Bearer " + $tokenResponse.access_token); 'correlation-id'=[guid]::NewGuid()}
$body = @{ciphertext=$args[0]}
$unwrapResponse = Invoke-RestMethod -Uri ($kmsURIbase + $crk + "/actions/unwrap") -Method POST -Body (ConvertTo-Json $body) -Headers $headers

Write-Output ("Plaintext key: " + $unwrapResponse.plaintext)

Because this exposes your key to your administrator or operator, after restoring VBR from configuration backup, you should generate a new key for subsequent configuration backups.

You can see the full example script unwrap-key.ps1 in GitHub.

Summary

In this article, I’ve showed how you can use IBM Cloud key management APIs to generate and manage encryption keys for use with Veeam Backup and Replication. You can see full examples of the scripts excerpted above in GitHub. These scripts are a basic example that are intended to be extended and customized for your own environment. You should take special care to consider how you manage and protect your IBM Cloud service ID API keys, and how you save and manage the wrapped form of the keys generated by these scripts. Most likely you would store all of these in your preferred secret manager.

Multipath iSCSI for VMware in IBM Cloud

Today we’re really going to go down the rabbit hole. Although there was not a great deal of fanfare, earlier this year IBM Cloud released support for up to 64 VMware hosts to attach an Endurance block storage volume using multipath connections. In order to use multipath, this requires the use of some APIs that are not well documented. After a lot of digging, here is how I was able to leverage this support.

First, your account must be enabled for what IBM Cloud calls “iSCSI isolation.” All new accounts beginning in early 2020 have this enabled. You can check whether it is enabled using the following Python script:

# Connect to SoftLayer
client = SoftLayer.Client(username = USERNAME, api_key = API_KEY, endpoint_url = SoftLayer.API_PUBLIC_ENDPOINT)

# Assert that iSCSI isolation is enabled
isolation_disabled = client['SoftLayer_Account'].getIscsiIsolationDisabled()
assert isolation_disabled == False

iSCSI isolation enforces that all devices in your account are using authentication to connect to iSCSI. In rare cases, some accounts may be using unauthenticated connections. If the above test passes, your account is ready to go! If the above test fails, you should first audit your usage of iSCSI connections to ensure they are all authenticated. Only if the above test fails and you have verified that either you are not using iSCSI, or all of your iSCSI connections are authenticated, then open a support ticket as follows. Plan for this process to take several days as it requires internal approvals and configuration changes:

Please enable my account for iSCSI isolation according to the standard block storage method of procedure.

Thank you!

Once the above test for iSCSI isolation passes, we are good to proceed. We need to order the following from IBM Cloud classic infrastructure:

  1. Endurance iSCSI block storage in the same datacenter as your hosts, with OS type VMware.
  2. A private portable subnet on the storage VLAN in your instance. Ensure the subnet is large enough to allocate two usable IP addresses for every current or future host in your cluster. We are ordering a single subnet for convenience, although it is possible to authorize multiple subnets (either for different hosts, or for different interfaces on each host). A single /25 subnet should be sufficient for any cluster since VMware vCenter Server (VCS) limits you to 59 hosts per cluster.

The Endurance authorization process authorizes each host individually to the storage, and assigns a unique iQN and CHAP credentials to each host. After authorizing the hosts, we then specify which subnet or subnets each host will be using to connect to the storage, so that the LUN accepts connections not only from the hosts’ primary IP addresses but also these alternate portable subnets. The following Python script issues the various API calls needed for these authorizations, assuming that we know the storage, subnet, and host ids:

STORAGE_ID = 157237344
SUBNET_ID = 2457318
HOST_IDS = (1605399, 1641947, 1468179)

# Connect to SoftLayer
client = SoftLayer.Client(username = USERNAME, api_key = API_KEY, endpoint_url = SoftLayer.API_PUBLIC_ENDPOINT)

# Authorize hosts to storage
for host_id in HOST_IDS :
  try :
    client['SoftLayer_Network_Storage_Iscsi'].allowAccessFromHost('SoftLayer_Hardware', host_id, id = STORAGE_ID)
  except :
    if 'Already Authorized' in sys.exc_info()[1].message :
      pass
    else :
      raise

# Lookup the "iSCSI ACL object id" for each host
hardwareMask = 'mask[allowedHardware[allowedHost[credential]]]'
result = client['SoftLayer_Network_Storage_Iscsi'].getObject(id = STORAGE_ID, mask = hardwareMask)
aclOids = [x['allowedHost']['id'] for x in result['allowedHardware']]

# Add our iSCSI subnet to each host's iSCSI ACL
for acl_id in aclOids :
  # Assign; note subnet is passed as array
  client['SoftLayer_Network_Storage_Allowed_Host'].assignSubnetsToAcl([SUBNET_ID], id = acl_id)

  # Verify success
  result = client['SoftLayer_Network_Storage_Allowed_Host'].getSubnetsInAcl(id = acl_id)
  assert len(result) > 0

At this point, the hosts are authorized to the storage. But before we can connect them to the storage we need to collect some additional information. First, we need to collect the iQN and CHAP credentials that were issued for the storage to each host:

STORAGE_ID = 157237344

# Connect to SoftLayer
client = SoftLayer.Client(username = USERNAME, api_key = API_KEY, endpoint_url = SoftLayer.API_PUBLIC_ENDPOINT)

# Lookup the iQN and credentials for each host
hardwareMask = 'mask[allowedHardware[allowedHost[credential]]]'
result = client['SoftLayer_Network_Storage_Iscsi'].getObject(id = STORAGE_ID, mask = hardwareMask)
creds = [ { 'host' : x['fullyQualifiedDomainName'],
            'iqn'  : x['allowedHost']['name'],
            'user' : x['allowedHost']['credential']['username'],
            'pass' : x['allowedHost']['credential']['password'] } for x in result['allowedHardware']]
print("Host connection details")
pprint.pprint(creds)

For example:

Host connection details
[{'host': 'host002.smoonen.example.com',
  'iqn': 'iqn.2020-07.com.ibm:ibm02su1368749-h1468179',
  'pass': 'dK3bACHQQSg5BPwA',
  'user': 'IBM02SU1368749-H1468179'},
 {'host': 'host001.smoonen.example.com',
  'iqn': 'iqn.2020-07.com.ibm:ibm02su1368749-h1641947',
  'pass': 'kFCw2TDLr5bL4Ex6',
  'user': 'IBM02SU1368749-H1641947'},
 {'host': 'host000.smoonen.example.com',
  'iqn': 'iqn.2020-07.com.ibm:ibm02su1368749-h1605399',
  'pass': 'reTLYrSe2ShPzZ6A',
  'user': 'IBM02SU1368749-H1605399'}]

Note that Endurance storage uses the same iQN and CHAP credentials for all LUNs authorized to a host. This will enable us to attach multiple LUNs using the same HBA.

Next, we need to retrieve the two IP addresses for the iSCSI LUN:

STORAGE_ID = 157237344

# Connect to SoftLayer
client = SoftLayer.Client(username = USERNAME, api_key = API_KEY, endpoint_url = SoftLayer.API_PUBLIC_ENDPOINT)

print("Target IP addresses")
storage = client['SoftLayer_Network_Storage_Iscsi'].getIscsiTargetIpAddresses(id = STORAGE_ID)
pprint.pprint(storage)

For example:

Target IP addresses
['161.26.114.170', '161.26.114.171']

Finally, we need to identify the vendor suffix on the LUN’s WWN so that we can positively identify it in vSphere. We can do this as follows:

STORAGE_ID = 157237344

# Connect to SoftLayer
client = SoftLayer.Client(username = USERNAME, api_key = API_KEY, endpoint_url = SoftLayer.API_PUBLIC_ENDPOINT)

props = client['SoftLayer_Network_Storage_Iscsi'].getProperties(id = STORAGE_ID)
try    : wwn = [x['value'] for x in props if len(x['value']) == 24 and x['value'].isalnum()][0]
except : raise Exception("No WWN")
print("WWN: %s" % wwn)

For example:

WWN: 38305659702b4f6f5a5a3044

Armed with this information, we can now attach the hosts to the storage.

First, create two new portgroups on your private vDS. Our design uses a shared vDS across clusters but unique portgroups, so they should be named based on the instance and cluster name, for example, smoonen-mgmt-iSCSI-A and smoonen-mgmt-iSCSI-B. Tag these port groups with the storage VLAN, and ensure that each portgroup has only one active uplink. iSCSI-A should have uplink1 active and uplink2 unused, while iSCSI-B should have uplink2 active and uplink1 unused:

Next, create kernel ports for all hosts in each port group, using up IP addresses from the subnet you ordered earlier. You will end up using two IP addresses for each host. Set the gateway to Configure on VMkernel adapters and using the gateway address for your subnet:

Next, let’s begin a PowerCLI session to connect to the storage and create the datastore. First, as a one-time setup, we must enable the software iSCSI adapter on every host:

PS /Users/smoonen/vmware> $myhost = Get-VMHost host000.smoonen.example.com
PS /Users/smoonen@us.ibm.com/Desktop> Get-VMHostStorage -VMHost $myhost | Set-VMHostStorage -SoftwareIScsiEnabled $True

SoftwareIScsiEnabled
--------------------
True

Next, also as a one-time setup on each host, bind the iSCSI kernel ports to the iSCSI adapter:

PS /Users/smoonen/vmware> $vmkA = Get-VMHostNetworkAdapter -PortGroup smoonen-mgmt-iSCSI-A -VMHost $myhost
PS /Users/smoonen/vmware> $vmkB = Get-VMHostNetworkAdapter -PortGroup smoonen-mgmt-iSCSI-B -VMHost $myhost
PS /Users/smoonen/vmware> $esxcli = Get-EsxCli -V2 -VMHost $myhost
PS /Users/smoonen/vmware> $esxcli.iscsi.networkportal.add.Invoke(@{adapter='vmhba64';force=$true;nic=$vmkA})
true
PS /Users/smoonen/vmware> $esxcli.iscsi.networkportal.add.Invoke(@{adapter='vmhba64';force=$true;nic=$vmkB})
true

Finally, once for each host, we set the host iQN to the value expected by IBM Cloud infrastructure, and also initialize the CHAP credentials:

PS /Users/smoonen/vmware> $esxcli.iscsi.adapter.set.Invoke(@{adapter='vmhba64'; name='iqn.2020-07.com.ibm:ibm02su1368749-h1605399'}) 
false
PS /Users/smoonen/vmware> $hba = Get-VMHostHba -VMHost $myhost -Device vmhba64
PS /Users/smoonen/vmware> Set-VMHostHba -IscsiHba $hba -MutualChapEnabled $false -ChapType Preferred -ChapName "IBM02SU1368749-H1605399" -ChapPassword "reTLYrSe2ShPzZ6A"

Device     Type         Model                          Status
------     ----         -----                          ------
vmhba64    IScsi        iSCSI Software Adapter         online

Now, for each LUN, on each host we must add that LUN’s target addresses (obtained above) as dynamic discovery targets. You should not assume that all LUNs created in the same datacenter share the same addresses:

PS /Users/smoonen/vmware> New-IScsiHbaTarget -IScsiHba $hba -Address "161.26.114.170"             

Address              Port  Type
-------              ----  ----
161.26.114.170       3260  Send

PS /Users/smoonen/vmware> New-IScsiHbaTarget -IScsiHba $hba -Address "161.26.114.171"

Address              Port  Type
-------              ----  ----
161.26.114.171       3260  Send

After this, we rescan on each host for available LUNs and datastores:

PS /Users/smoonen/vmware> Get-VMHostStorage -VMHost $myhost -RescanAllHba -RescanVmfs

SoftwareIScsiEnabled
--------------------
True

This enables us to locate the new LUN and create a VMFS datastore on it. We locate the LUN on all hosts but create the datastore on one host. Locate the LUN using the WWN suffix obtained above:

PS /Users/smoonen/vmware> $disks = Get-VMHostDisk -Id *38305659702b4f6f5a5a3044
PS /Users/smoonen/vmware> New-Datastore -VMHost $myhost -Vmfs -Name "smoonen-mgmt2" -Path $disks[0].ScsiLun.CanonicalName        

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
smoonen-mgmt2                           48.801          49.750

Finally, rescan on all hosts to discover the datastore:

PS /Users/smoonen/vmware> Get-VMHostStorage -VMHost $myhost -RescanAllHba -RescanVmfs

SoftwareIScsiEnabled
--------------------
True

We can confirm that we have multiple paths to the LUN as follows:

PS /Users/smoonen/vmware> $luns = Get-ScsiLun -Id *38305659702b4f6f5a5a3044
PS /Users/smoonen/vmware> Get-ScsiLunPath -ScsiLun $luns[0]

Name       SanID                                    State      Preferred
----       -----                                    -----      ---------
vmhba64:C… iqn.1992-08.com.netapp:stfdal1303        Standby    False
vmhba64:C… iqn.1992-08.com.netapp:stfdal1303        Standby    False
vmhba64:C… iqn.1992-08.com.netapp:stfdal1303        Active     False
vmhba64:C… iqn.1992-08.com.netapp:stfdal1303        Active     False

Securely connect your private VMware workloads in the IBM Cloud

This article originally appeared in 2017 on IBM developerWorks, which is being sunset. Although 2020 brings a long awaited shift in focus to NSX-T, the instructions in this article are still relevant for NSX-V implementations.


IBM® and VMware® announced a new partnership in 2016 that culminated in the release of VMware vCenter Server on IBM Cloud, an automated standardized deployment of a complete VMware virtualization environment in the IBM Cloud, including VMware vSphere, VMware NSX, and optionally VMware vSAN technologies. Since the announcement, IBM and VMware continued to enhance offerings with new features and services. IBM Cloud’s vCenter Server offering is the fastest way to deploy a fully operational VMware virtualization environment in the IBM Cloud.

This tutorial is for anyone who is interested in migrating data, creating firewall rules, building a topology, and more.

Connecting to the public cloud

Your VMware vCenter Server (VCS) instance in the IBM Cloud is initially deployed with minimal public network access for the IBM software components and any services that require such access for usage reporting, such as Zerto Virtual Replication.

Many IBM Cloud services are available to your VMware workloads over your private network, including file storage, block storage, object storage, load balancing, email delivery, and digital transcoding.

However, many other IBM Cloud services, such as Cloudant®, IBM Cloud Functions (formerly OpenWhisk), API Connect™, and Weather Company® Data, can be reached only over the public network.

In this tutorial, we show you how to securely connect your private multi-site VCS instances to IBM Cloud public services. This tutorial assumes the most complex case of setting up public connectivity for a multi–site workload. For single–site deployments, or for deployments that use VLAN instead of VXLAN, some of the steps will not be necessary. After completing this tutorial, you will know how to easily and securely connect your private VMware workloads to public IBM Cloud services.

The IBM Cloud: Migrate your workload while preserving your security

This tutorial is based on IBM Code’s fictional Acme Freight company and its transformation story. View the full journey (and while you’re at it, grab the sample code) to see how Acme Freight implemented the network topology. See how they were able to migrate their workload between data centers, allowing external access from the workload to their IBM Cloud services—all while preserving the security of their workload that is running in their private IBM Cloud virtualized network.

Acme Freight’s VMware application uses several IBM Cloud services to implement their weather–based routing recommendation engine. Their recommendation engine is implemented by using IBM Cloud Functions (formerly OpenWhisk) programming service, which allows for rapid innovation and development at a very low cost. They subscribe to IBM Cloud Weather Company Data (now deprecated) for weather forecasts and alerts. They use IBM Cloud’s API Connect service for additional security, governance, and analytics for their APIs. All of these components allow Acme Freight to monetize and rate limit their service as they expand their business. Figure 1 is an example of API Connect’s monitoring interface for Acme Freight.

Figure 1. API Connect monitoring interface

figure01

Figure 2 shows the topology for Acme Freight’s application that is running on VMware vCenter Server on IBM Cloud.

Figure 2. Acme Freight network topology

figure02

The following numbered steps show you how we built up this topology from Figure 2. Note that the application might migrate between the two data centers, therefore we will configure each data center to have a local egress point to the public network.

Network topology: Building your internal network

① Cross–vCenter NSX

VMware NSX is VMware’s network function virtualization (NFV) technology. NSX is not just about network virtualization, but also provides significant security benefits through its micro–segmentation firewalling capabilities. NSX also offers the flexibility of plugging many third–party network functions into the NSX network flows.

Many companies are adopting NSX in their own data centers because of the flexibility and security that NSX provides. Even if you are not using NSX in your own data center, you should use it when deploying VMware in the cloud. Using NSX in the cloud will give you much more flexibility and control over the networks and addressing in your environment, and will position you to take advantage of the other benefits of NSX down the road.

If you have deployed a multi–site VMware vCenter Server topology, your vCenter servers are linked together but your NSX managers are not yet linked. In this step, we will associate the NSX managers across your instances, which will allow us to create logical networks (VXLANs) that stretch across your sites. This simplifies the communications between your workloads and enables your workloads to migrate seamlessly between sites, as in the case of Acme Freight. For more information about cross–vCenter NSX design and architecture, refer to VMware’s NSX cross–vCenter design guide.

This step requires you to choose a site to serve as the primary NSX manager and delete the NSX controllers on all other connected sites. For consistency and simplicity, we recommend that you choose your primary VCC instance as the primary NSX manager. You should perform this step before you create any logical switches at any of your secondary sites:

  1. Use the vSphere Web Client to log in to vCenter.
  2. Before configuring cross–vCenter NSX, ensure that all sites have unique segment ID ranges for their logical switches. Each logical network is assigned a segment ID, much like a VLAN has an ID.
    1. Determine the segment ID ranges that you will configure at each site for local switches and for the universal switches. Your choice determines how many switches can be created at each site, and how many universal networks can be created. In the case of Acme Freight, we chose the following:
      1. Primary site: 6000–6499
      2. Secondary site: 6500–6999
      3. Universal: 7000-7999
    2. Navigate to Networking & Security > Installation. Select the Logical Network Preparation tab, then select the Segment ID pane.
    3. Select the IP address of the NSX manager that will serve as your primary manager.
    4. Click Edit and adjust the segment ID pool to your desired range.
    5. Repeat steps c and d for each of your secondary NSX managers. We will configure the universal segment IDs in a later step.
  3. Navigate to Networking & Security > Installation and select the Management tab.
  4. Select the IP address of the NSX Manager that will serve as your primary manager.
  5. Click Actions > Assign Primary Role, and click Yes when prompted.
  6. In the NSX Controller nodes table, locate the three NSX controllers that are managed by the NSX Manager that will serve as your secondary manager. For each controller:
    1. Select the controller.
    2. Click the red X icon to delete it.
    3. Wait for the deletion to complete before proceeding.
    4. Refresh the screen if you are unable to click the delete button.
  7. Log in to the IBM Cloud for VMware Solutions portion of the IBM Cloud catalog.
  8. Click Deployed Instances and select your secondary instance. Make note of the NSX Manager IP address, HTTP user name, and HTTP password.
  9. Return to the vSphere Web Client NSX installation page.
  10. Select the Primary NSX Manager.
  11. Select Actions > Add Secondary NSX Manager.
  12. Enter the IP address, HTTP user name, and HTTP password that you noted in step 8.

Once completed, one NSX manager will be listed as Primary and the other as Secondary. You should see six rows in the NSX Controller nodes table, but only three unique IP addresses, since the three controllers are now shared between the primary and secondary sites. It will take a few minutes for your controllers to go into a connected state; if this does not happen, select the Secondary Manager and click Actions > Update Controller State. Figure 3 shows the result.

Figure 3. NSX managers and controllers

figure03

Repeat steps 5 through 12 for any additional secondary instances you want to include in your universal transport zone.

② NSX Universal Transport Zone

In this step, we set up a universal transport zone, allowing your sites to share NSX logical switches and routers.

  1. In the vSphere Web Client, navigate to Networking & Security > Installation and select the Logical Network Preparation tab.
  2. Ensure the Primary NSX Manager is selected in the drop–down list, click the Segment ID pane, and click Edit.
  3. Choose a Universal Segment ID pool independent of your local segment IDs. In Acme Freight’s case, we chose the range 7000–7999 for our segment IDs, as shown in Figure 4.
    Figure 4. Segment IDs

    figure04

  4. Select the Transport Zones pane.
  5. Click the green plus icon to add a transport zone. Select Mark this object for Universal Synchronization so that it is created as a universal transport zone. Select your cluster to connect it to the transport zone. In Acme Freight’s case, we named it UniversalTransportZone.
    Figure 5. Universal transport zone

    figure05

  6. Select your Secondary NSX manager from the drop–down list. Select the UniversalTransportZone, then select Action > Connect Cluster to connect your secondary vCenter to this transport zone.
  7. Select the cluster and click OK.
  8. Repeat steps 6–7 for any additional Secondary NSX managers in your environment.

③ Logical switches

In this step, we create the logical switches that serve as the virtual networks for our solution. You can think of each logical switch as the virtual equivalent of a physical VLAN. The traffic for these switches is encapsulated in VXLAN packets if it is routed between hosts.

You will need to plan for your own networking needs, including both the number of logical switches and the subnets in use by them. In Acme Freight’s case, we created the following logical switches:

  1. Universal Web-Tier
    This network hosts the web servers for Acme Freight. Its subnet is 172.16.10.0/24.
  2. Universal App-Tier
    This network hosts the application servers for Acme Freight. Its subnet is 172.16.20.0/24.
  3. Universal Primary-Transit
    This network is a transit network that routes traffic to the public network for the primary site. Its subnet is 172.16.100.0/27.
  4. Universal Secondary-Transit
    This network is a transit network that routes traffic to the public network for the secondary site. Its subnet is 172.16.200.0/27.

In a later step, we will create a logical router to route traffic between these networks.

Create each logical switch with the following steps:

  1. In the vSphere Web Client, navigate to Networking & Security > Logical Switches.
  2. Ensure the Primary NSX manager is selected in the drop–down list.
  3. Click the green plus icon to create a logical switch.
  4. Name your switch.
  5. For the Transport Zone, click Change and select your universal transport zone.
  6. Ensure Unicast is selected, as shown in Figure 6.
  7. Click OK.
    Figure 6. Logical switch

    figure06

④ Logical router

In the previous step, we created several logical (or virtual) networks. You could begin deploying virtual machines on these networks right away, but these virtual machines will only be able to communicate with other virtual machines on the same network. To route traffic between virtual networks, we need to deploy a logical router.

VMware NSX provides logical (or distributed) routers (DLRs) for single–site configurations, and universal logical routers (UDLRs) to route traffic on universal logical switches like the ones we created previously. In this step, we deploy a universal logical router with local egress. We will deploy a single UDLR with a pair of router appliances located in each site.

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Edges.
  2. Ensure the Primary NSX manager is selected in the drop–down list.
  3. Click the green plus icon.
  4. The first panel is shown in Figure 7:
    1. Choose an Install Type of Universal Logical (Distributed) Router.
    2. Select Enable Local Egress.
    3. Name your router.
    4. Enable High Availability. We will deploy two appliances to ensure that traffic continues to be routed even if one appliance is lost due to host failure.
      Figure 7. UDLR name and description

      figure07

  5. On the second panel, select a user name and password for the appliance administration.
  6. On the third panel, click the green plus icon to configure the deployment of a UDLR appliance. Configure a total of two appliances to a suitable location in your primary site, as shown in Figure 8. We will deploy the appliances for the secondary site in a later step.
    Figure 8. UDLR deployment configuration

    figure08

  7. In the fourth panel, configure the interfaces for your logical router.
    1. Even if you did not enable High Availability, you must assign an HA interface. This interface is used for the appliances to detect each other’s availability. You can use the primary transit network for your HA interface.
    2. Configure one interface for each of your logical switches, including the secondary transit network. This allows the primary site to route public network traffic for the secondary site even if the secondary site’s public link fails. Ensure that your subnet configuration matches the network architecture you planned earlier for each logical switch. The transit networks should be uplink interfaces; all other networks should be internal interfaces.
    3. We will later deploy a gateway device on the transit networks, so our UDLR should not be assigned a gateway address (by convention the first address) on the transit networks. However, the UDLR will serve as the gateway for all other logical switches. The addresses we assigned for Acme Freight’s case, shown in Figure 9, are as follows:
      1. Universal Web-Tier
        internal interface, 172.16.10.1/24
      2. Universal App-Tier
        internal interface, 172.16.20.1/24
      3. Universal Primary-Transit
        uplink interface, 172.16.100.2/27
      4. Universal Secondary-Transit
        uplink interface, 172.16.200.2/27

        Figure 9. UDLR interfaces

        figure09

  8. In the fifth panel, configure the default gateway for this UDLR appliance. Specify the gateway address for the primary transit network; we will later deploy a gateway appliance at this address. Figure 10 shows this as configured for Acme Freight.
    Figure 10. UDLR default gateway

    figure10.png

  9. Complete the creation of the UDLR and its primary appliances.
  10. If you deployed your appliances to the same cluster, resource pool, and datastore, you should configure a DRS affinity rule to ensure the appliances run on separate hosts.

Now let’s deploy the UDLR’s appliances at your secondary site. For each secondary site, perform the following steps:

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Edges.
  2. Select the secondary NSX manager in the drop–down list.
  3. Select your UDLR in the list.
  4. In the Manage tab, select the Settings pane and choose Configuration.
  5. Click the green plus icon to configure a new UDLR appliance and choose an appropriate location for it
  6. In the HA Configuration panel, click Change to configure HA. Select Enable and choose your secondary transit network as the HA interface.
  7. Click the green plus icon to configure your second UDLR appliance, and choose an appropriate location for it.
  8. If you deployed your appliances to the same cluster, resource pool, and datastore, you should configure a DRS affinity rule to ensure the appliances run on separate hosts.

Network topology: Building your external network

① NSX edge gateways

In this step, we will deploy NSX Edge Services Gateway (ESG) devices that will serve as gateways between your logical networks and the public network. We will configure them to NAT outbound traffic from your workload to the public network. VMware designates this outbound NAT as source NAT (SNAT). Depending on your needs, you could also configure inbound NAT to your workload from the public network, which is termed destination NAT (DNAT). We will deploy a separate highly available ESG pair in each site, since each site has its own primary networking.

First, we must order public subnets from the IBM Cloud for use with your ESGs:

  1. Log in to the IBM Cloud portal.
  2. First, ensure that you know the public VLANs for your vSphere hosts. Follow these steps:
    1. Navigate to Devices > Device List.
    2. Identify one of your vSphere hosts on your primary site and select it.
    3. In the Network section, under the Public heading, note the site and VLAN. For example, wdc04.fcf03a.1165.
    4. Repeat steps 2a through 2c for each of your secondary sites.
  3. Navigate to Network > IP Management > Subnets.
  4. Select Order IP Addresses.
  5. Choose a Portable Public subnet
  6. Select four portable public IP addresses and click Continue.
  7. Select the VLAN you identified earlier for your primary site.
  8. Fill out the RFC 2050 information and place your order.
  9. Repeat steps 4–8 for each of your secondary sites.

You should find that there is already a CIDR–28 public portable subnet on these VLANs, which is used by the IBM Cloud management component to communicate with the IBM Cloud portal. In the IBM Cloud portal, navigate to Network > IP Management > Subnets, and review the details for the CIDR–30 subnets you ordered. You should add a note to these subnets to indicate their purpose; for example, “Workload NAT.” Click to view the details for each subnet. Note the gateway address and the address that is available for your use. We will use the latter address for the NSX ESG. You should add a note to this address to indicate its purpose; for example, “NSX ESG public IP.”

Now we will deploy your ESGs by using the addresses you ordered:

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Edges.
  2. Select your Primary NSX manager in the drop–down list.
  3. Click the green plus icon to deploy a new NSX ESG.
  4. In the first panel, select Edge Services Gateway, name your ESG, and select Enable High Availability, as shown in Figure 11.
    Figure 11. NSX ESG name and description

    figure11

  5. On the second panel, select a user name and password for the appliance administration.
  6. On the third panel, click the green plus icon to configure the deployment of a gateway appliance. Configure a total of two appliances to a suitable location in your primary site, as shown in Figure 12.
    Figure 12. Configure NSX ESG deplyment

    figure12

  7. In the fourth panel, configure the interfaces for your gateway, as shown in Figure 13.
    1. The uplink interface should be your public network. From the distributed portgroup list, select the SDDC-DPortGroup-External distributed portgroup. Configure the IP address that you ordered from IBM Cloud with the subnet prefix of 30.
    2. The internal interface should be your primary transit network. Configure the gateway address you identified for your primary transit network. In the case of Acme Freight, this is 172.16.100.1/27.
      Figure 13. ESG interfaces

      figure13

  8. In the fifth panel, configure the default gateway for this appliance. Specify the gateway address for the subnet you ordered from IBM Cloud earlier. Figure 14 shows this as configured for Acme Freight.
    Figure 14. ESG default gateway

    figure14

  9. In the sixth panel, configure your default firewall policy and set the HA interface to the internal interface.
  10. Complete the creation of the ESG.
  11. If you deployed your appliances to the same cluster, resource pool, and datastore, you should configure a DRS affinity rule to ensure the appliances run on separate hosts.
  12. Repeat these steps for each of your secondary sites to deploy an NSX ESG pair in those sites, on the appropriate transit network, and using the subnet you ordered for that site.

② Dynamic routing

In this step, we will enable OSPF dynamic routing between the ESGs and the UDLR. This will allow the UDLR to dynamically discover the gateway routes available in each site and thus identify the closest active gateway based on the site in which your workload is running.

First, we will configure each UDLR appliance to recognize the locale that it is running in. Since we enabled local egress on the UDLR, the locale ID will be used by the UDLR to filter the routes that it configures on your hypervisors. This configuration will allow it to configure preferred routes that differ at each site:

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Managers.
  2. Double–click the NSX manager for your primary site and select the Summary tab.
  3. Copy the ID field, as shown in Figure 15.
    Figure 15. NSX Manager ID

    figure15

  4. Navigate to Networking & Security > NSX Edges and select the Primary NSX manager from the drop-down list.
  5. Double-click your UDLR.
  6. Select the Manage tab and the Routing pane, then select Global configuration.
  7. Click Edit next to Routing Configuration and enter the router ID.
  8. Click Publish Changes to commit the changes.
    Figure 16. Publish locale ID changes

    figure16

  9. Repeat these steps for each of your secondary NSX managers and the UDLR appliances that are associated with them, taking care to select the correct NSX manager in steps 2 and 4.

Now we need to enable OSPF for each of your UDLR appliances:

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Edges and select the Primary NSX manager from the drop–down list.
  2. Double–click your UDLR and select the Manage tab.
  3. In the Routing pane, select the Global Configuration option.
  4. Click Edit next to the Dynamic Routing Configuration and ensure that the primary transit network is selected for the Router ID, as shown in Figure 17.
    Figure 17. UDLR router ID

    figure17

  5. Commit your changes by clicking Publish Changes.
  6. In the Routing pane, select the OSPF option.
  7. Configure the OSPF settings.
    1. Click Edit to configure settings.
    2. Mark it Enabled.
    3. Enter an unused address in the primary transit network for the protocol address. The UDLR will send and receive OSPF traffic on this address.
    4. The forwarding address is the address that the UDLR uses for sending and receiving routed traffic. Enter the UDLR’s existing address on the primary transit network.
      Figure 18. UDLR OSPF settings for Acme Freight

      figure18

  8. Create an Area Definition, as shown in Figure 19
    Figure 19. UDLR OSPF area

    figure19

  9. Map the area to the primary transit interface, as shown in Figure 20.
    Figure 20. UDLR interface mapping for OSPF

    figure20

  10. Click Publish Changes to commit the changes that you made to the OSPF configuration.
  11. Repeat these steps for each of your secondary sites to configure OSPF for the UDLR appliances in those sites. Be sure to select the appropriate secondary NSX manager in step 1 and the appropriate secondary network and addresses in steps 4, 7, and 9.

Lastly, we need to enable OSPF for the NSX ESGs so that they can communicate with the UDLR.

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Edges and select the Primary NSX manager from the drop–down list.
  2. Double–click your NSX ESG and select the Manage tab.
  3. In the Routing pane, select the Global Configuration option.
  4. Click Edit next to the Dynamic Routing Configuration and ensure that the primary uplink network is selected for the Router ID, as shown in Figure 21.
    Figure 21. NSX ESG router ID

    figure21

  5. Commit your changes by clicking Publish Changes.
  6. In the Routing pane, select the Enable OSPF option, as shown in Figure 22.
    Figure 22. NSX ESG OSPF settings

    figure22

  7. Create an Area Definition, as shown in Figure 23, that matches your UDLR area definition.
    Figure 23. NSX ESG OSPF area

    figure23

  8. Map the area to the primary transit interface, as shown in Figure 24.
    Figure 24. NSX ESG interface mapping for OSPF

    figure24

  9. Click Publish Changes to commit the changes that you made to the OSPF configuration.
  10. Repeat these steps for each of your secondary sites to configure OSPF for the ESGs in those sites. Be sure to select the appropriate secondary NSX manager in step 1 and the appropriate secondary network in steps 4 and 9.

③ Firewall and NAT configuration

Finally, we will configure the NSX edge gateways, which we deployed in step 5, to allow outbound connections from your applications by using address translation.

  1. In the vSphere Web Client, navigate to Networking & Security > NSX Edges.
  2. Ensure that the Primary NSX manager is selected in the drop–down list and double–click the NSX ESG that you created for public connectivity at your primary site.
  3. In the Manage tab, select the Firewall panel.
  4. Click the green plus icon to create a new firewall rule to permit outbound traffic, as shown in Figure 25.
    1. The source IP address will likely include your application’s original address (firewall rules are applied before NAT rules). You can use various constructs to select the source address, including cluster, logical switch, vApp, virtual machine, and IP address specification.
    2. You can limit the destination address and services if needed.
      Figure 25. Firewall rule

      figure25

  5. Publish your changes.
  6. In the Manage tab, select the NAT panel.
  7. Click the green plus icon and select Add SNAT Rule to create a new rule for translating private IP addresses to a public IP address, as shown in Figure 26.
    1. The Original Source IP address will be the range of addresses that are assigned to virtual machines on the virtualized network, 172.16.0.0/16.
    2. The Translated Source IP address is from the uplink interface of the ESG.
      Figure 26. SNAT rule

      figure26

  8. Publish your changes.
  9. Repeat steps 2–8 for each of your Secondary NSX managers and ESGs. Be sure to specify the Translated Source IP from the uplink interfaces on the ESGs of the secondary sites.

Summary

In this tutorial, we set up a cross–vCenter NSX, created universal logical switches that allow your workloads and communications to traverse your sites over virtual networks. We also set up a universal logical router to route traffic between these networks, and created gateways at each location that allow for outbound traffic to connect to the public network. All of these steps allow you to extend your VMware applications to use public IBM Cloud services, such as Watson Personality Insights or the Watson IoT Platform.

Since we are using NAT for the outbound connections, your workloads will experience a momentary loss of connection if you perform a live migration between sites. This connection loss is caused by the connection source IP address (as seen by the outside network), which will change as you move from site to site. But your workload will be able to immediately re–establish connection.

This tutorial only scratches the surface of what is possible with VMware NSX in the IBM Cloud. We created firewall rules for an NSX Edge, but you can create firewall rules that are applied to all traffic, including intra–switch traffic. Depending on your requirements, you might also need to consider alternative topologies. If you require inbound connections to your application, you’ll also need to consider the NAT configuration (including single versus double NAT), and the potential need for a cross–site load balancer. VMware’s NSX cross–vCenter design guide describes various recommended topologies and the design considerations for each of them.

Enjoy your new-found virtual networking powers and the powerful array of IBM Cloud services right at your fingertips!

Acknowledgements

The authors, Scott Moonen and Kurtis Martin, are grateful to Daniel De Araujo and Frank Chodacki for setting up the multi-site test environment and providing NSX architectural guidance.

Provisioning and expanding an IBM Cloud VMware instance via API

IBM Cloud for VMware Solutions recently released a set of public APIs. These APIs allow you to use your IBM Cloud API key to perform operations such as:

  • Get information about your vCenter instance, admin credentials, deployment history, clusters, and hosts
  • Verify parameters for ordering a new vCenter instance, cluster, or hosts
  • Order or remove a vCenter instance, cluster, or hosts

I’ve written some sample code demonstrating how you can authenticate with the IBM Cloud APIs using your API key, and how to interact with the IBM Cloud for VMware APIs. Note that these samples only perform order verification, but you can easily extend them to perform actual orders or removals.

A key use case for these APIs is to expand and contract your VMware instance based on utilization or for workload bursting scenarios. With these APIs, you can now fully automate this process.