March 5, 2009 - Sven Huisman

Citrix XenApp on VMware ESX: 1 or 2 vCPU?

In the past I’ve published two articles (here and here) about best practices on running Citrix XenApp (presentation server) on VMware ESX. It were all best practices gathered by other people, but I agreed with a lot of the recommendations. Until recently, I did also agree on using 1 vCPU XenApp virtual machines (with less users per VM) rather than 2 vCPU XenApp virtual machines (with more users per VM). From my own experience and what I got from the community, the performance will be degraded if you use 2 vCPUs. As a mather of fact, during a session on this subject at VMworld 2008 in Las Vegas, this was what stated:

– Always set up your Citrix VMs with 1 vCPU
– Due to the scheduling done by VMware at the hypervisor layer you will degrade performance if you use 2 vCPUs

So having experienced it myself and read about it on the internet, there was no reason for me to try it differently on newer version of ESX. But after reading “VMware Platform Performance Index”, a benchmark performed by the guys over at Virtual Reality Check (Ruben Spruijt and Jeroen van de Kamp), I got confused.

So what’s this document about?

The primary purpose of this whitepaper is to provide information about the scalability and best practices of virtualized Terminal Server and Virtual Desktop workloads using VMware ESX.

One of the conclusions regarding virtualizing a terminal server workload:

Using more than one virtual CPU per Terminal Server VM was generally not recommended. However, the tests in project VRC have proven that enabling two vCPU’s, allowed more much more users and gave a more consistent user experience.

Mmm, I must read further…

It is important to realize that the Terminal Server workload is unique. No other workload has so many active processes and threads within a single instance of Windows. Therefore, configuring only one vCPU per Terminal Server VM is not recommended.

It sounds quiet logical, but why is everbody saying: don’t use 2 vCPU Terminal server VM’s?

For long, it has not been a best practice to not enable more than one vCPU per Terminal Server VM. Interestingly, the tests in project VRC prove differently as long as the total amount of vCPUs in all VMs combined do not exceed the physical amount of CPU cores.

So I was thinking: how is this possible? Are these guys from Project VRC the first one to test this? Did something change between versions of ESX, so everyone is assuming now that 1 vCPU TS or Citrix VMs gives a better overall performance, while this is not true anymore with the latest ESX release? One thing I found that could have something to do with it, is something Duncan Epping wrote about a while back. It’s about what changed between ESX 2.x and ESX 3.x regarding “co-scheduling SMP VMs in VMware ESX Server“. In short: ESX 2.x uses “strict coscheduling” whereas ESX 3.x uses “relaxed coscheduling”. Duncan’s explanation:

In other words VM’s with multiple vCPU’s don’t take up cycles anymore when these vCPU’s aren’t used by the OS. ESX checks the CPU’s for the idle proces loop and when it’s idle the CPU will be released and available for other vCPU’s. This also means that when you are using an application that will use all vCPU’s the same problems will still exists as they did in ESX 2.5.

As one of my customers wanted to test this to see if it improved user experience, I decided to give it a go and upgrade a XenApp VM with an extra vCPU. After initial testing, we decided to upgrade all XenApp VMs and put them in a seperate ESX cluster. I don’t have hard numbers, but the user experience improved noticeably. As this was only a small farm of 6 VMs, I do recommend to test this first, because user workloads can be different of course.

In conclusion:

– 2 vCPU terminal server (or XenApp) VMs allows more users and gives a more consistent user experience
– Use dedicated hardware (ESX cluster) for the TS/XenApp VMs
– Don’t overcommit total amount of vCPUs

I would love to hear your own experience and why you (don’t) use 2 vCPU XenApp or TS VMs, so feel free to comment!

Virtual Applications Citrix / ESX / VMware / VRC / XenApp /

Comments

  • Gabrie says:

    I didn’t read the PQR docs yet, but I was wondering… did they test the max nr of users on 1 VM and compare that to other hypervisors? Or did they load multiple VMs with one and multiple with 2 vcpu’s and then see how many users can be loaded onto the same box?

    Gabrie

    • Sven Huisman says:

      Both 1 VM and multiple VMs.

    • Corne Hendrickx says:

      Hi Sven,

      We are running a Citrix Farm with 180 machines all under VMWare. We have tested the following senarios:

      1 ESX Host with 6 Virtual Citrix Servers. 1 vCPU max 22 users
      1 ESX Host with 6 Virtual Citrix Servers. 1 vCPU max 24 users
      1 ESX Host with 6 Virtual Citrix Servers. 1 vCPU max 26 users
      1 ESX Host with 7 Virtual Citrix Servers. 1 vCPU max 22 users
      1 ESX Host with 7 Virtual Citrix Servers. 1 vCPU max 24 users
      1 ESX Host with 7 Virtual Citrix Servers. 1 vCPU max 26 users
      1 ESX Host with 8 Virtual Citrix Servers. 1 vCPU max 22 users
      1 ESX Host with 8 Virtual Citrix Servers. 1 vCPU max 24 users
      1 ESX Host with 8 Viriual Citrix Servers. 1 vCPU max 26 users

      We found out that is was the best configuration 1 ESX with 7 Virtual Citrix server with 22 users. These test we have done is with help of Vmware. They told us to use just one vCpu at a virtual server.

      We have used this configuration for some weeks, but the employees started to complain about freezing session.

      After that we have entered vCPU reservations about 200 until 400 MHz. This responded in a better user experiance.

      But we still got complaints about freezing session. We have discussed the configuration with microsoft en the advised use to configure minimal 2 cpu’s for Windows 2000 and higer systems even on a virtual platform. (this is for Xen/VMware/Virtual Server.)
      The reason for this is how windows divide processes on a single CPU versus 2 or more CPU’s.

      After we have configured this configuration we had a very good user experiance. Even with 7 x 22 users / fysicale server. (154 on a fysicale server aind bad 😛 )

      You can get a higher number of users on a fysicale server if you use a 64 bit OS that can adress more then 4 GB of memory. That is the bottele neck wy we decide to created a limit on 22 users.

      The only counter that is extreemly high is the number of interrupts. But this is not a problem and is is normal if there is a heavyload on the server.

      • Sven Huisman says:

        Thanks for sharing your test-results. Strange though that VMware advises 1 vCPU per virtual server for a Citrix VM. It all depends on the applications you run, have published desktops or applications, etc. But it seems your results are similar to mine. In the end, user experience is more important then the maximum users on a box.

  • Duncan says:

    Funny thing is I’ve also did performance testing and could get more users on an ESX box by using 1 vCPU vm’s than when using 2 vCPU vm’s. And we even used RVI, which I will do a blog post on tonight…

    • Sven Huisman says:

      Hi Duncan, thanks for the comment. I’ve read about RVI and that it can really improve performance for TS/Citrix VM’s. Did you also test without RVI? Looking forward to your article!
      BTW, there is a difference between getting the most TS/Citrix users on a box and getting better user experience.

  • Jason Boche says:

    Good article. I’m 2+ months behind in publishing my own blog entry on virtualizating Citrix Presentation Server (XenApp).

    Scalability of users per XenApp server is based largely on workloads in the XenApp server (published apps, published desktops, etc.) Applications will have varying impacts on server resources.

    Scalability of virtualized XenApp servers per ESX host or cluster is also going to be based largely on workloads in the XenApp server same as above.

    My particular Citrix servers are very CPU light tend to be more heavy on the memory. Therefore, I configure virtualized XenApp servers with 1 vCPU and 4GB vRAM. When the Resource Manager Load Evaluators hit 100% indicating a full XenApp server, it’s always because of a low memory condition. X number of users have exhausted most of the available RAM while CPU utilization will be at 5-10% only.

    So this is one of those “your mileage will vary” and “it depends” type scenarios that consultants are so famous for using, but it’s the absolute truth here. There is no one right answer and that’s why virtualizing XenApp can be tricky. It’s all about the workloads and each company and environment will have different workloads. Traditionally speaking, published desktops chew up a lot more server resources than published applications. That applies to both physical and virtualized XenApp servers. Stay away from published desktops if you can because that’s going to quickly cut down your consolidation ratio. Besides, if have a lot of users needing published desktops, then use a different type of VDI solution rather than paying for the extra licensing of Citrix on Windows on VMware where you will get considerably less mileage out of your infrastructure and your dollar. Go with published dekstop on physical XenApp or VDI on VMware, or even PC on a blade.

    Jas

  • Wharlie says:

    I’ve done a lot of citrix on ESX, and I get best user results using 2vcpus.
    This may not always give the highest consolidation ratio, but virtualisation is not all about consolidation (despite what some people think)
    The issue for me with 1 vcpu tends to be that a single process can dominate the whole cpu, affecting performance for all users, sometimes to the point of having to hard power off the vm to fix the issue.
    With 2 vcpu’s the containment of rogue processes to a single cpu at least allows other proceeses to continue on the other cpu, and with relaxed co-scheduling the host can still use unused cpu cycles for other vm’s.
    You could still have issues if more than one process maxes out at the sime time, but this is less likely.

  • Rob Mokkink says:

    Do you still use cpu affinity as a best practice?

  • […] the way I also noticed that Virtualfuture.info just released a post on the 1 vCPU vs 2 vCPU topic. They support the Project VRC outcome. Our outcome was different, but like I said our testing […]

  • Rob L. says:

    If your not over commiting the vCPU’s then performance should be similar to comparing single and dual cpu physical servers. Setting affinity shouldn’t made a difference either since there already is a one to one relationship between virtual and physical CPUs. The vCPU’s would have very close to 100% of a physical CPU. It’s not till there is contention for physical CPU’s that you will see performace drops for dual vCPU’s VM.

  • Michael H says:

    I’m getting very different results compared to the vmware article – I’ve deployed XenServer 5.5 at a client running Office 2007 SP2 and Syspro. With XenServer – 20 users on the VM – 4GB, 1vCPU – Peak times CPU usage is at 10 – 20% !! Same application set different client on ESX 3.5 Update 4 – 13 users – CPU at 50 – 60% – I’ve set those servers to max at 25 users – I’m now running 6 Xenapp VM’s 2vcpu – CPU usage is high, but users arent complaining.
    Bottom line is – its all over the net – XenServer outperforms ESX 3.5 when it comes to XenApp….been there done that got the t-shirt….interested to see some vSphere 4 Tests though.

    • Will K says:

      I saw similar results on my XenApp deployment but this changed drastically after upgrading to hardware with Intel Nehalem processors.

  • Andy says:

    I’ve had mixed results depending on the workload, but still lean toward 1 vCPU … http://www.thegenerationv.com/2009/10/optimizing-xenapp-on-vmware-esx.html

  • Siegfried says:

    Hi,

    I set 2 Vcpu on my Xenapp servers and it works better.

    Thanks a lot for your article!

  • Jacqui Brockwell says:

    Has anyone actually done any testing with 4 x vCPUs though? We have been running this for some time quite successfully but now I’m wondering whether we should be using 2 instead?

    • Sven Huisman says:

      I’ve been doing some testing for project VRC (http://projectvrc.nl/) and you really should read phase II because you can read that 4 vCPU configs are most prefered. Because you can get nearly as much users on a box than with 2 vCPU machines (not overcommitting on CPU) and you therefor have less VM’s to manage.

  • Rob jaudon says:

    I need some help with a some CPU constraint issues( i think) diagnosis. I have a client with a Citrix Xenapp farm (5 desktop servers/9 Application servers) spread accross 12 ESX hosts running ESX4.0. All Citrix servers, desktop and application servers have 2 cCPU’s allocated and 54% of the VM’s running on the ESX hosts are allocated 2 vCPU’s. The problem we are running into is that the Citrix Application servers are running(crawling) when trying to access the published applications. I personally think it is a multiple VCPU issue but I would like to get an opinion on if my findings are true.

    Below are some screen shots from the VSphere client and esxtop readings.
    My questions are what are the best practices for Citrix on ESX?
    Is it typical to have multiple VCPU’s per server in a Citrix environment?
    For ESXTOP readings should I be looking at the %RDY and overall performance of the host?