VMware Cloud Director HTTP error 431: Request Header Fields Too Large

Tomas Fojta wrote previously about issues with Cloud Director errors having to do with excessively large cookies. This is a common problem for cloud providers where there may be multiple web applications, some of which fail to properly limit their cookie scope. At the moment I am writing this, my browser is sending about 6.7kB of cookie data when visiting cloud.ibm.com. This is close to the limit supported by Cloud Director, and sometimes it goes over that limit.

Tomas suggested an approach using the NSX load balancer haproxy configuration to filter cookies. Unfortunately, Tomas’s approach does not cover all possible cases. For example, it does not cover the case where only one of these two cookies is present, and it does not cover the case where there are additional cookies in the header after these two cookies. Furthermore, there are additional cookies used by Cloud Director; at a minimum this includes the following:

  • JSESSIONID
  • rstd
  • vcloud_session_id
  • vcloud_jwt
  • sso-preferred
  • sso_redirect_org
  • *.redirectTo
  • *.state

If you have a known limited list of cookies (or cookie name patterns) like this that you want to pass to your application, it is relatively easy to program a positive cookie filter with an advanced load balancer such as VMware Avi Load Balancer. But if you are using the NSX embedded load balancer and are limited to the haproxy approach of using reqirep with regular expressions, it is an intractable problem. Therefore, instead of using reqirep to selectively include the cookies that Director needs, I recommend the approach of using reqirep to selectively and iteratively delete cookies that you know are likely to be large and to overflow Director’s supported limit. It may take some iterative experimentation over a period of time for you to identify all of the offending cookies.

For example, we can use the following four rules to remove two of the larger cookies for cloud.ibm.com, neither of which are needed by Director. For each cookie I am removing, I have written a pair of rules: the first rule removes the cookie if it appears anywhere other than the end of the cookie list, and the second removes it if it is at the end of the list:

reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.iamcookie\.prod=[^;]*;(.*)$ \1\ \2
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.iamcookie\.prod=[^;]*$ \1
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.Identity\.prod=[^;]*;(.*)$ \1\ \2
reqirep ^(Cookie:.*)com\.ibm\.cloud\.iam\.Identity\.prod=[^;]*$ \1

vCenter key provider server certificates

I’ve written a couple of posts on vCenter key provider client certificates and caveats related to configuring them. In this post I shift to discussing server certificates.

When you connect to a key provider, vCenter only offers you the option of trusting the provider’s end-entity certificate:

Typically an end-entity certificate has a lifetime of a year or less. This means that you will be revisiting the provider configuration to verify the certificate on at least an annual basis.

However, after you have trusted this certificate, vCenter gives you the option of configuring an alternate certificate to be trusted. You can use this to establish trust with one of your key provider’s CA certificates instead of the end-entity certificate. Typically these have longer lifetimes, so your key provider connectivity will be interrupted much less frequently.

You may have to work with your security admin to obtain the CA certificate, or depending on how your key provider is configured you may be able to obtain the certificate directly from the KMIP connection using a tool like openssl:

root@smoonen-vc [ ~ ]# openssl s_client -connect private.eu-de.kms.cloud.ibm.com:5696 -showcerts
CONNECTED(00000003)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root G2
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
verify return:1
depth=0 C = US, ST = New York, L = Armonk, O = International Business Machines Corporation, CN = private.eu-de.kms.cloud.ibm.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = New York, L = Armonk, O = International Business Machines Corporation, CN = private.eu-de.kms.cloud.ibm.com
   i:C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jun 18 00:00:00 2024 GMT; NotAfter: Jun 17 23:59:59 2025 GMT
-----BEGIN CERTIFICATE-----
MIIHWDCCBkCgAwIBAgIQCK1qBW4aHA51Yl6cJVq96TANBgkqhkiG9w0BAQsFADBZ
. . .

You can then paste this certificate directly into the vCenter UI:

After doing this, vCenter will still display the lifetime validity of the end-entity certificate, rather than the CA certificate. But it will now be trusting the CA certificate, and so this trust will extend to the next version of the end-entity certificate, as long as it is signed by the same CA.

vCenter key provider client certificates, part 2

Previously I explained how vCenter creates a new client certificate with each key provider connection. This is a good thing; it enables you to connect vCenter to the same provider multiple times as a different identity, which can be valuable in certain multitenant use cases.

However, there is also a bug in the vCenter UI that generates this certificate. For a split second, the UI presents one certificate, but then switches to a new value. If you click the copy button too quickly, you will copy the wrong certificate:

Be sure to wait for the screen to refresh before copying your certificate!

Customizing root login for VMware Cloud Director

Your Linux VM running in VMware Cloud Director might be preconfigured with the best-practice configuration to disable root password login. This might prevent you from using the root password that you set with Director’s Guest OS Customization:

#PermitRootLogin prohibit-password

You can override this behavior using a Guest OS Customization script in a couple of ways. The simplest approach is to use your customization script to set the sshd configuration to allow root password logins:

#!/bin/bash
sed -ie "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config

Or, if you prefer, you can use the customization script to insert an SSH public key for the root user:

#!/bin/bash
echo "ssh-rsa AAAAB3...DswrcTw==" >> /root/.ssh/authorized_keys
chmod 644 /root/.ssh/authorized_keys

vSAN sizer

I find that there are several commonly overlooked considerations when sizing a vSAN environment:

  • It is not recommended to operate a vSAN environment at over 70% capacity
  • If you use a resilience strategy of FTT=1, you should plan to perform a full evacuation during host maintenance or else you will be at risk of data loss due to drive failure during maintenance. Depending on your configuration and usage, the time required for a full evacuation can easily take 24 hours or more. In addition, a maintenance strategy of full evacuation requires you to leave one host’s worth of capacity empty.
  • Because of these considerations, I recommend a resilience strategy of FTT=2. With this strategy you have the option of performing host maintenance using ensure-accessibility rather than full evacuation, which is much faster but is still resilient to one failure during maintenance.
  • If you size your environment strictly to the minimum number of nodes for your configuration, then you will fail to create virtual machines or snapshots during host maintenance—including any snapshots used for backups or replication—unless you force provisioning of the object contrary to the storage policy. For this reason, you should consider provisioning at least one more host than is strictly required.

Many of these considerations are summarized in this helpful VMware blog post, which includes a helpful table documenting host minimums and RAID ratios: Adaptive RAID-5 Erasure Coding with the Express Storage Architecture in vSAN 8.

I’ve taken these considerations and created a vSAN sizer Excel workbook, to help both with planning and sizing a vSAN environment.

GitHub printability

Certain complex GitHub Markdown documents don’t render well—or especially, print well—with the whitespace gutters on either side of the screen. This is especially true when there are complex tables in a document.

I use the following userContent.css stylesheet in my browser to override the use of gutters. This renders complex tables much better:

@-moz-document domain(github.com) {
 .container-lg { max-width: none !important; }
 .container-xl { max-width: none !important; }
}

Some tables are so complex that they still spill off the edge of the page even in portrait mode. To counteract this I adjust the print scaling, or print in landscape mode, or both.

Migration from VMware Shared to VMWaaS

IBM Cloud recently announced the end of sale of new VMware Shared environments, with a planned end of support date for all VMware Shared environments of January 2025.

VMware Shared is IBM Cloud’s first-generation IBM-managed VMware Cloud Director offering. It was based on VMware’s legacy NSX-V network virtualization technology. VMware-as-a-Service (VMWaaS) is IBM Cloud’s next-generation IBM-managed VMware Cloud Director offering. It leverages NSX-T network virtualization and provides other advanced capabilities such as the ability to create dedicated environments whose hosts are not shared with other tenants.

Refer to IBM’s announcement for details on how to migrate between these environments and for minor changes in pricing.

vCenter key provider client certificates

When you are configuring a standard key provider in VMware vCenter, you can authenticate vCenter to the key provider using any of the following options:

  • vCenter Root CA Certificate
  • vCenter Certificate
  • Upload [a custom] certificate and private key
  • New Certificate Signing Request (CSR) [to be processed by the key provider]

I commonly choose “vCenter certificate.” This uses a certificate signed by the vCenter CA. The certificate is generated specifically for use with KMIP and it has a 10-year expiration.

Importantly, a new certificate is generated by vCenter for each key provider that you configure.

Furthermore, if you cancel the trust process before completing it, vCenter will generate a new certificate the next time you perform the trust process. I’ve been bitten by this in the past—I generated a certificate, cancelled the dialog, and sent the certificate to my cryptographic administrator. When I received confirmation the certificate had been configured, I re-initiated the trust process, but this time it used a new certificate. This took quite some time to debug. Make sure that you complete the trust process even if you expect there to be a waiting period before the certificate is configured in your key provider!