March 4, 2013 - Sven Huisman

Site-failover a VMware View Stateful desktop, not supported?!

In this blogpost I describe the challenge with stateful desktops in a multi-datacenter View environment: VMware doesn’t support it!

Stateful vs. Stateless

In my opinion, a stateful desktop is a desktop, usually with a static user assignment, where the user can make changes to the system, like installing applications. A stateless desktop is a desktop in a desktop pool with a floating assignment and configured to refresh after the user logs off. It’s no problem to failover a user to another View site, using stateless desktops. This blogposts is about site-failover a stateful desktop.

Multi-datacenter View architecture

A typical VMware View architecture is based on the concept of View blocks and pods. A View block consist of one or more vSphere clusters with a maximum of 2000 virtual desktops (this limit is changed in Horizon View 5.2 to 10.000). A View Pod consists of one or more View blocks and a View Management block, where the View Connection servers are “clustered”. All the View Connection servers in a View block share the same information and replicates an ADAM database which contains the configuration of the View infrastructure.

One of the disadvantages of VMware View in my opinion, is that if you want to implement a VMware View infrastructure across multiple datacenters and you want an architecture that is supported by VMware, you have to create multiple VMware View Pods. One of the “rules” of VMware is that View Connection servers needs to be on the same LAN and the same location. If you have multiple datacenters where you want to implement View Connection servers, each datacenter will be configured as a separate View environment.

From the View Architecture planning guide:

View Connection Server Cluster Design Considerations
You can deploy multiple replicated View Connection Server instances in a group to support load balancing and high availability. Groups of replicated instances are designed to support clustering within a LAN-connected single-datacenter environment. VMware does not recommend using a group of replicated View Connection Server instances across a WAN due to the communication traffic needed between the grouped instances. In scenarios where a View deployment needs to span datacenters, create a separate View deployment for each datacenter.

“VMware does not recommend” is not the same as not supported in my opinion.

Update: this is the statement in the View Architecture planning guide in the Horizon View 5.2 version:

IMPORTANT VMware does not recommend or support using a group of replicated View Connection Server instances across a WAN, MAN (metropolitan area network), or other non-LAN due to the communication traffic needed between the grouped instances. In scenarios where a View deployment needs to span datacenters, create a separate Horizon View deployment for each datacenter.

From the View installation guide:

Network Requirements for Replicated View Connection Server Instances
If you install replicated View Connection Server instances, configure the instances in the same location and connect them over a high-performance LAN.
Do not use a WAN to connect replicated View Connection Server instances.
Even a high-performance WAN with low average latency and high throughput might have periods when the network cannot deliver the performance characteristics that are needed for View Connection Server instances to maintain consistency.
If the View LDAP configurations on View Connection Server instances become inconsistent, users might not be able to access their desktops. A user might be denied access when connecting to a View Connection Server instance with an out-of-date configuration.

So replicating View connection servers over WAN can lead to inconsistent LDAP configuration, but still no support statement.

Update: from the Horizon View 5.2 installation guide:

Network Requirements for Replicated View Connection Server Instances

If you install replicated View Connection Server instances, configure the instances in the same location and connect them over a high-performance LAN. When installing replicated View Connection Server instances, you must configure the instances in the same physical location and connect them over a high-performance LAN.

Do not use a WAN, MAN (metropolitan area network), or other non-LAN to connect replicated View Connection Server instances. Even a high-performance WAN, MAN, or other non-LAN with low average latency and high throughput might have periods when the network cannot deliver the performance characteristics that are needed for View Connection Server instances to maintain consistency. If the View LDAP configurations on View Connection Server instances become inconsistent, users might not be able to access their desktops. A user might be denied access when connecting to a View Connection Server instance with an out-of-date configuration.

Finally, in the KB article “Network connectivity requirements for VMware View Manager 4.5 and later” the following statement can be found:

Ensure that the Replica Server is on the same LAN as the Connection Server. Replication over WAN is not supported.

There is a good discussion in the VMware community about this topic and the definition of LAN and why replication is not supported over WAN.

Challenges

The first challenge is to let your users logon to the right View Pod. F5 has a load balancer that has the capabilities to check in which View pod a user has a virtual desktop entitlement and then redirect the user to that View Pod. More information can be found in this presentation.

The second challenge is to be able to login to another View Pod, when a disaster happens to one of the datacenters for example. When the user logs on to a stateless desktop this is not a big issue. A new stateless desktop pool can be created on the other datacenter and once the users are entitled they are able to log on again. This can be automated with powershell or the stateless desktop pool can even be pre-deployed, just in case of disaster.
And now I’m getting to the point I want to make. Stateful desktops can not be transferred from one datacenter to another datacenter in case of a disaster. Even if the stateful desktops are replicated on storage, it is not supported by VMware to add the stateful desktops to another View Pod.

From the VMware KB article “Moving View-managed desktops between vCenter Servers is not supported“:

IMPORTANT NOTE: Moving View-managed desktops between vCenter Servers is NOT supported.

I find it disturbing that VMware, a company that is build on virtualization, high availability and disaster recovery, cannot support a site-failover of a stateful desktop! I don’t think this has changed in Horizon View 5.2, but I will update this post once I know. BTW, the fact that it’s not supported by VMware doesn’t mean it is not possible, but usually the first requirement of the customer is to implement a design supported by VMware.

Solution

To be honest, to failover stateful desktops to another datacenter is not a common requirement. Being able to use a stateless desktop in case of a disaster is sufficient for some customers. On the other hand, there is a reason someone needs a stateful desktop, so not being able to use it in case of a datacenter disaster can be unacceptable for other customers.
To me, the only difference between a stateless desktop and a stateful destop is that the user can install applications in the stateful desktop and the next time the user logs in, those applications are still present. The user profile, personal application settings and (personal) data can be managed by different technologies already. Think about user profile management and folder redirection for example. What if there was a technology that is capable of capturing the applications that the user installs and put them on a separate location? This way, stateless desktops can be used together with this technology and when one site fails, the user can log on to a new stateless desktop on the other location and this technology will present the personal applications in this desktop. But wait, this technology already exists, it’s called “layering”!

In the next weeks, I will review a couple of solutions, like VMware Horizon Mirage, LiquidwareLabs ProfileUnity FlexApp, Ceedo Desktop and Appsense StrataApps, to see if this is the way to solve the “problem”. Maybe it’s time to say goodbye to stateful desktops forever and embrace the layering technology!

In the meantime, let me know if you:

  • use a multi-site VMware View environment (Multiple View pods or a single View Pod (unsupported)?)
  • site-failover stateful desktops in a VMware View environment. How have you implemented this?
  • Use an application layering technology in a VMware View environment.

Virtual Desktop Desktop / failover / Site recovery / stateful / stateless / VMware View /

Comments

  • Joel says:

    Nice article Sven.

    I think you can move/replicate statefull (non-linkedclone) view desktops between datacenters and the register them at the other side in case of emergency.
    (There is even some scripts that we developed with EMC to get this going but that was some time ago..)

    What the article is talking about is moving the desktop and assume that the old View Connection-broker will find it again and connect users to it, this is not the case here.

    // Joel

    • Sven Huisman says:

      Joel,

      thanks for your comment. I didn’t say it can’t be done, I know it can be scripted. It’s the support statement I’m worried about:
      IMPORTANT NOTE: Moving View-managed desktops between vCenter Servers is NOT supported.

      If you are right then VMware should update the KB article. If it’s not clear for me, it’s not clear for my customers.

  • Helen says:

    Now it is possible to see VMware View Failover Automation Solution in ByteLife Solution Center. ByteLife has developed the VMware View Failover Automation with the following key functionalities:
    •Failover desktop pools and virtual machines in case of site crash
    •Migrate desktop pools and virtual machines during maintenance, tests, rebalancing the load between sites or as the failback after disaster
    •Restore storage synchronization between datacenters after the outage
    •Integration with vSphere WebClient

    Failover of the VMware View environment might take only minutes, depending on the number as well as nature of desktops and the components that are failed over. First users can restart their work in new site in less than 5 minutes after the failover is initiated.

  • Alar says:

    http://www.bytelife.com/view-failover-automation/
    You can find a video demo here about the View Failover – both stateful and linked clone desktops.