May 18, 2018 - Sven Huisman
Using Nutanix 1-click upgrades with VMware Horizon
In an environment with VMware Horizon, you use VMware vSphere as the hypervisor. This hypervisor needs to be patched and updated every now and then. Using VMware Update Manager (VUM) to install updates and patches is a very common practice. With the use of instant clones that doesn’t always work as it should, see VMware KB2144808. This article describes an issue which is fixed in Horizon 7.1, but when using VUM the manual steps are still required.
When running VMware vSphere on Nutanix you update ESXi in a different way. In this article, I describe how you can use Nutanix 1-click upgrade to install updates and patches when using VMware Horizon, and what the key considerations are.
VMware Update Manager
When you are running VMware vSphere you are probably using VMware Update Manager to install patches and keep the vSphere infrastructure up to date. VUM will put a host in maintenance, vMotion all the powered-on VMs to other hosts in the cluster, install the patches, reboot the host and exit Maintenance mode. VUM will then continue with installing patches on the next host. You can even enable parallel remediation of hosts in a cluster if there are enough resources available to host the powered-on VMs.
When you run VMware vSphere on Nutanix you can use VUM to install patches, but it requires manual intervention for each host. On each host in a Nutanix cluster, there is a controller VM (CVM) running. When you need to reboot a host in a Nutanix cluster, you need to vMotion all the running VMs to other hosts and then log on to the Controller VM with SSH and shut down the Controller VM with the following command:
nutanix@cvm$ cvm_shutdown -P now
And you can only have one powered off CVM in a cluster, so parallel remediation is definitely a no-go!
So when you are finished installing the patches and have rebooted the host, the first thing you do after taking it out of maintenance mode is power on the CVM. You then have to wait until all the services are started (SSH into another CVM and run command “cluster status“), before putting the next host in maintenance. This means a lot of manual intervention when you use VUM to update your vSphere hosts on a Nutanix cluster.
Nutanix feature: 1-click upgrades
Luckily Nutanix has a feature called 1-click upgrade. This feature will take care of putting a host in maintenance mode, shutting down the CVM the right way, installing the patches and after the reboot, power on the CVM and wait for it to be available again, before proceeding to the next host. This feature is, of course, dependent on DRS to migrate the powered-on VMs, but it definitely simplifies the process of installing patches, without the requirement of manual intervention.
Nutanix qualifies specific VMware ESXi hypervisor updates and provides a related JSON metadata upgrade file on the Nutanix Support Portal for 1-click upgrade through the Prism web console Software Upgrade feature. Installing patches from the VMware site is supported, but these may not have been qualified. From the support FAQ:
Nutanix supports the ability to patch upgrade hosts with versions that are greater than or released after the Nutanix qualified version, but Nutanix may not have qualified those releases at that time.
For updates that are made available by the hypervisor vendor that do not have a Nutanix-provided JSON metadata upgrade file (ie. not officially qualified), obtain the offline bundle and md5sum checksum available from the hypervisor vendor, then use the Prism web console Software Upgrade feature to upgrade.
Using the 1-click upgrade to install ESXi patches or updates is well documented. In short, these are the steps you need to take:
- From the VMware website, download the offline bundle and copy the associated MD5 checksum (obtained from the VMware web site)
- Save the files to your local machine or media
- In the Prism web console, open Upgrade Software from the gear icon and click Hypervisor.
- Click the upload a Hypervisor binary link.
- Copy the MD5 checksum into the Hypervisor MD5 Checksum field.
- Scroll down and click Choose File for the binary file, browse to the offline bundle file location, select the file, and click Upload Now.
When the file upload is completed, click Yes to confirm., then click
After the Host pre-upgrade check is finished, it will start patching the hosts one by one:
Nutanix will put the vSphere hosts in maintenance mode, one at a time. DRS will vMotion the powered-on VMs to another host in the cluster. One thing to note, and this is important for linked clones and instant clones, the powered-off VMs will not be migrated to another host when a host is put into maintenance mode.
After all powered-on (user) VMs are migrated, the 1-click upgrade process will power-down the CVM. Next, the host will go into maintenance mode, patch(es) will be installed, the host is rebooted and then it will be taken out of maintenance. Then the CVM will be powered on again and the 1-click upgrade process will wait until the CVM is up and running again. 1-click upgrade will then proceed with the next host.
Impact on VMware Horizon environments
Within a VMware Horizon environment, we are dealing with Virtual desktops or RDSH VMs which are remotely used by users. You can definitely vMotion desktop and RDSH VMs from one host to another, but this can impact the user experience. So you want to perform maintenance on the hosts preferably when there is not a peak usage of the VDI or RDSH environment. Besides that, you can use the 1-click upgrade feature to install patches and upgrades on hosts running VDI or RDSH VMs. Whether you can really call it a 1-click upgrade depends on the VM deployment option you are using. There are a couple of different VM deployment options with VMware Horizon:
- Full clones
- Linked clones
- Instant clones
And then you also could have NVIDIA vGPU-enabled VMs which require extra attention.
Let’s take a look at each type of deployment, see what the key considerations are and what you need to do to smoothen the installation of patches or upgrades.
Using Nutanix 1-click upgrade with full clones is easy. Full clones do not have a dependency on a replica disk so the can easily be vMotioned by DRS. No extra steps required.
When you create a linked clone desktop pool you first create a Master VM. This Master VM is the VM where you install the operating system with the most used applications (or all applications, depending on the application delivery method). You then power off this Master VM and create a snapshot. VMware Horizon admin console is then used to create the linked clone pool. You will then see that a Replica VM is created, which is a thin-provisioned full clone of the Master VM. The disk of this Replica will serve as the common read disk for all of the linked clones in the pool. The linked clone VMs will use a Delta disk to write the changes during the lifetime of the linked clone.
Linked clone VMs can be vMotioned to other hosts. The linked clones are dependent of the availability of the disk of the Replica VM, but this is always available on the storage, even when the host where the Replica VM is registered to is down. So there is no problem there during the 1-click upgrade process. However, Linked clone VMs do need extra attention. First of all, there are two options for logoff behavior:
- Refresh after logoff
- Delete after logoff
With refresh after logoff, the desktop will revert to a snapshot of the delta-disk and the VM will be available as a fresh desktop for a new user. Even new desktops can be created as long as the Replica VM is registered to a host which is not in maintenance mode or rebooting.
When the option “Delete after logoff” is selected, the desktop VM is deleted and will be recreated using the Replica VM. When this Replica VM is not available (this is the case when the Replica VM is registered to a host which is in maintenance mode or rebooting), the linked clone pool will go into error:
This is caused by the fact that powered-off VMs, like the Replica VM, is not migrated to another host during maintenance. When this is the case and a new VM needs to be created, the pool will go into provisioning error.
My advice is to make sure there are enough linked clone VMs available during maintenance, temporary change “Delete after logoff” to “Refresh after logoff” and disable provisioning on the pool:
I don’t want to explain what an instant clone is, but it is important to know how it is created. In the picture represents the creation of instant clone VMs:
The Master VM is the same as with linked clones (operating system with most used applications, with a snapshot). VMware Horizon admin console is then used to create the instant clone pool. The first action that is performed on vSphere is a Template (in vSphere represented as “cp-template-” followed by a unique identifier) that is created. This is a linked-clone to the Master VM.
The next step is the creation of the Replica, which is a thin-provisioned full clone of the template VM. This Replica is the same as with linked clones, it will serve as the common read disk for all of the instant clones in the pool.
The next step is the creation of the so-called “Parent VM”. This Parent is a powered-on VM, linked to the Replica and is present on every host. The instant clone desktops are forked from this Parent VM but are linked to the Replica after creation, not to the Parent. And this is important to remember because it means that the Parent VMs are only needed to create the instant clones. After that, they can be deleted without affecting the instant clones.
And this is also what happens when you put a host in maintenance mode when there are instant clones on it. The instant clones will be vMotioned to other hosts in the cluster and the Parent VMs will be deleted.
Once the host is back online and a creation of a new instant clone is triggered, a new Parent VM will be created. When the Parent VM is created, new instant clone VMs can be forked on that host.
So it seems there is no problem using the Nutanix 1-click upgrade feature with instant clones. However, there is one major consideration. Because 1-click upgrade puts vSphere hosts in maintenance mode without moving powered-off VMs, it sometimes happens that the creation of a new instant clone will fail and put the desktop pool in error. To be able to create new instant clones a Parent VM needs to be present on the host. If it’s not there (because the host has been in maintenance mode before) it needs access to the Replica VM. If the Replica VM is hosted on a host which is not available (in maintenance mode or rebooting), the Parent VM cannot be created and new instant clones can also not be created on that host:
You can prevent this from happening by disabling provisioning the instant clone pool(s). At least until the host with the Replica VM is patched and back online:
If you have multiple instant clone pools, don’t be tempted to disable provisioning on the vCenter, because then the parent VMs will not be deleted and the host will not go into maintenance mode:
Just make sure that during the maintenance window there are enough instant clone VMs available, because when a user logs off, the VM will be deleted, but not recreated when the provisioning is disabled.
At this moment, vGPU-enabled VMs cannot be vMotioned on VMware vSphere. You can use the 1-click upgrade feature, but it will not be unattended. A couple of manual steps must be taken when patching a vSphere cluster. First, disable the vGPU-enabled pool(s) in Horizon. And second, when a host is entering maintenance mode, it will wait until all the powered on VMs are powered off. Manually (or create a script) to shut down the vGPU VMs. When all the VMs are down (except the CVM), log on to the Controller VM with SSH and shut down the Controller VM with the following command:
nutanix@cvm$ cvm_shutdown -P now
Live migration (initially it will be “suspend and resume” on vSphere) of vGPU-enabled VMs is coming, and that will make patching and upgrading even easier. Here is a video of what it will look like on vSphere: