August 9, 2012 - Sven Huisman

Testing Teradici APEX 2800 with Login VSI – part 2

Before you continue to read this article, make sure you first read part 1. In part 1 I explain what the Teradici APEX 2800 card is and what the use-case is for the card. I also explain about LoginVSI, a VDI-benchmarking tool which I also use with Project VRC. My goal was to see the impact of the Teradici APEX 2800 card using the LoginVSI tool. I used different types of workloads and different types of tests to compare number of users, CPU usage, response times and network using different display protocols (RDP, PCoIP software and PCoIP with APEX 2800 card). These tests are not to compare user experience but to compare the impact on the host. These tests do not say anything about presenting a better image to the end-users! These tests are about user-density, impact on CPU and impact on network.

Disclaimer: As explained before, I performed the tests with LoginVSI. According to Teradici’s whitepaper, this is nog a good way to show the capabilities of the APEX 2800 card. I explained the reasons in part 1. LoginVSI is THE VDI benchmarking tool used by large companies like Citrix, HP and Cisco, to validate their reference architectures. In my opinion, LoginVSI is a very good tool to show the impact of using the Teradici APEX 2800 card compared to a situation where you don’t use the APEX 2800 card, if you want to see the impact on user-density. Again, it does not say anything about user experience!

I’ve performed 2 types of tests:

– VSIMax tests:
105 VMs (Windows 7 SP1, 1GB RAM, Office 2007SP2) are pre-booted. Every 30 seconds a user logs in and starts a LoginVSI test. During the entire test, response times are measured. Once an average response has reached the dynamic maximum, the VSIMax has been reached, which can be seen as the maximum number of users. After the last users has started a session, the sessions starts to log off.

– Steady-state tests:
10 or 30 VMs are pre-booted (depending on the workload). Every 30 seconds a user logs in and starts a LoginVSI test. After the last user has started a session, the users will continue the LoginVSI test. A timer has been set between 30 and 60 minutes. And the end of the timer, the users starts to log off.

All tests have been performed multiple times to see if there were big differences between the tests. Where possible, averages were taken and presented in the graphs. The colour-coding for the graphs and charts:
Black = RDP
Orange = PCoIP software (no APEX 2800 card)
Blue = PCoIP with APEX 2800 card

VSIMax tests

First I wanted to see the impact of using PCoIP compared to RDP. I knew that using PCoIP had an impact on the CPU compared to using RDP. It’s interesting to see if the APEX 2800 card can compensate the CPU overhead of PCoIP. The following chart shows the number of active sessions at a CPU utilisation of 80% and the maximum number of users on the host (VSIMax). Higher is better.

Sessions Medium RDPvsPCoIPvsAPEX

As you can see, with the Medium workload, the APEX 2800 card does not improve the user-density. The maximum number of users is even lower with APEX 2800 card! The lower VSIMax could also be caused by a lower baseline response time. The following chart shows the response times:

Response Medium RDPvsPCoIPvsAPEX

The response times are very close to each other, with APEX 2800 a little lower than without, so this could explain the slightly lower VSIMax (although I expected a VSIMax for APEX 2800 similar to the VSIMax of the RDP tests.

The following graph shows the CPU utilisation during a VSIMax test:

CPU Medium RDPvsPCoIPvsAPEX

The CPU utilisation of the tests with APEX 2800 card are slightly below the tests without card. Conclusion: with the medium VSI workload, the APEX 2800 card does not show an improvement in offloading the CPU.

In the following test, I wanted to see if disabling the “Build-To-Losless” feature (BTL) shows any difference in user-density or CPU utilisation. I used the VSI Medium workload.

Sessions Medium BTL

Very small differences, so from a performance perspective it doesn’t really matter if it’s enabled or disabled. The following graph also show that CPU utilisation is very close with each test (PCoIP with or without APEX card and BTL on or off).

CPU BTL PCoIPvsAPEX

In the next tests, I’ve changed the workload to VSI-multimedia. This workload is almost the same as the medium workload, only it has a MP3 playing during the session and once every loop a 720P video (WMV) of 20 seconds is being played. The chart shows the differences in session when the CPU utilisation is at 80% and the difference in VSIMax:

Sessions Multimedia PCoIPvsAPEX

And the response times are almost identical:

Response Multimedia PCoIPvsAPEX

The CPU utilisation graph shows that with the APEX 2800 card, the CPU is slightly offloaded:

CPU MM PCoIPvsAPEX

Conclusion of the VSIMax tests: I can understand now why the whitepaper of Teradici states that LoginVSI is not a suitable tool to show the capabilities of the APEX 2800 card. Hardly any improvement can be found using this test.

Steady-state tests

Let’s see if the steady-state tests show better results (for the APEX 2800 card). With steady-state tests, I use LoginVSI workloads but I only start a handful of sessions, so a bottleneck would not be reached. After all the users have started the test, I let it run for a while, to see the impact on CPU and network for example.
First, take a look at the medium workload. Only 30 VMs have been started. The following chart shows the average CPU utilisation of a 25 minutes period after the last user has started a session:

CPU SS Medium

A very small difference. The following graph shows the CPU utilisation during the test:

CPU SS Medium Graph

According to the whitepaper I mentioned earlier, a “video-only” workload is best to show the capabilities of the APEX 2800 card. I’ve created the same workload as mentioned in the whitepaper (4 times a 20 second 720P movie played with Media Player). I only used 10 VMs, because this is quite a CPU intensive workload. The following chart shows the average CPU utilisation of a 25 minutes period after the last user has started a session:

CPU SS Video

Now that’s more like it! Finally you can see what the APEX 2800 card is capable of. The following graph shows the CPU utilisation during the test:

CPU SS Video Graph

A very flat line, but this can be explained because the workload only plays a video, over and over again. Maybe not a very realistic workload, but it’s good to see that the APEX 2800 card can be effective.

I was also interested in network usage. As you might know, With VMware View 5.0 and higher, the PCoIP software protocol has been improved a lot. Especially when it comes to network usage. I was wondering if using the APEX 2800 card has an impact on this. The first chart show the network usage during the steady-state test with the medium workload:

Network SS Medium

More network packets and MBits when using the APEX 2800 card? Mmm, something to think about when considering being this card…
The following graphs represents “total packets/sec” and “MBits/sec” during the test with medium workload:

Medium workload – Total packets:
TotalPackets SS Medium

Medium workload – Total MBits:
TotalMBits SS Medium

Now see the same results, only with the video-only workload, and just 10 VMs.

Network SS Video

And now we see the advantage for the APEX 2800 card.

The following graphs represents “total packets/sec” and “MBits/sec” during the test with video-only workload:

Video-only workload – Total packets:
TotalPackets SS Video

Video-only workload – Total MBits:
TotalMBits SS Video

Conclusions
To be honest, I’m disappointed in the capabilities of the APEX 2800 card. I expected that the card would compensate the CPU overhead of PCoIP software compared to RDP, even with the medium workload. So with APEX 2800 card, I could host the same amount of sessions than when using RDP. This is not the case. As it turns out, the APEX 2800 card is only effective when there are a lot of pixel changes, watching video for example. Regular Office-like workloads or workloads with a small percentage of multimedia are not likely to benefit from this card. Especially when available network bandwidth is a concern, this card might not be the best solution for you.
However, when your users need to watch a lot of (HD) video in the VMware View session, and adding a second vCPU to the VMs is not an option, the APEX 2800 card could be beneficial.

If you have any questions about the tests I performed, please leave a comment.

If you want to know more about my test-setup, you should read Phase III of the Project VRC whitepapers. In that whitepaper the infrastructure I used for these tests is described, including the optimisations I added to the Windows 7 VM. In addition to this whitepaper, I used ESX 5.0 U1, VMware View 5.1, View Composer 3.0, View Client 5.1 and View agent 5.1.

The Teradici driver I used: Teradici APEX 2800 Server Offload Card software release 1.1.1 (build 15038) for ESXi 5.0.0 Update 1.

Virtual Desktop APEX 2800 / LoginVSI / PCoIP / Teradici / View / VMware / VMware View /

Comments

  • Tom says:

    It would be interesting to see XenDesktop 5.6 results just for comparison, in terms of density, network bandwidth, etc. Perhaps test their SoC enabled clients vs. a traditional thin client to see how effective the SoC is.

    • James says:

      Great tests.

      I do agree Tom, I’d like to see the Citrix SOC comparative tests when utilizing Flash multimedia. I haven’t found any research that demonstrates this with more than one end point.

  • Randy says:

    Be sure to read the Teradici whitepaper that Sven references since it explains why a login storm test like VSImax shows no benefit for APEX. Logging in doesn’t change many pixels, but it does use a lot of CPU. VSImax is a great benchmark for sizing servers to handle login storms, but not for seeing the benefits of APEX.

    As the video only benchmark shows, APEX does a great job offloading peak loads of pixel changes. One cannot easily predict when users may choose to watch a YouTube or training video. With APEX, the user experience of the other VDI sessions are not impacted when this happens. While the average offload for APEX is relatively small for most productivity applications, the key benefit is protecting experience during peaks.

    In a similar way, the average utilization on a GPU is generally very low, however, they are included in a PC to handle the peaks when they are needed. While APEX is not a GPU (it is offloading compression, not pixel rendering), it has a similar benefit to VDI deployments.

    • Sven Huisman says:

      Thanks for adding this, Randy. You’re right, the APEX is capable of offloading peaks of especially video material (or applications with a lot of pixel changes). But when do you just calculate this overhead in your sizing and at what point do you decide to add an APEX in every host? It’s not something you decide up front in my opinion. Only when you see the bottleneck of your VDI-hosts is the CPU and it’s caused by users watching a lot of video.

  • Randy says:

    Sven,

    The goal of most VDI deployments is to minimize complaints from bad user experience upfront rather than react after the fact. Since the user activities that generate a lot of pixel changes (videos or scrolling documents) are generally not controlled and can’t be anticipated, capacity planning is quite tricky. So, let’s say you want to size your VDI servers to be able to handle 10% of users doing something like watching a video at the same time, you could use LoginVSI in steady state with 10% of the VMs running a video workload and the other 90% running the Medium workload without Multimedia to simulate this case and then size your server at the number of VMs that keep the CPU load below the 80% that is best practices (note that VSImax tends to drive the CPU close to 100%). If you repeat your test with this mix with an APEX installed, you will find that the number of additional VMs you can support on the same server is much higher than ~8% that we measure using a 100% Medium Multimedia workload in the steady state.

    There are two main reasons for this. First is that the Medium Multimedia workload includes a video that is only 20 seconds long. Since the default launching sequence launches each new VM every 30 seconds, this means that each VM is perfectly synchronized so that only one VM at a time is running a video until the workloads start to repeat. With a 10 minute workload, this means that 1 in 20 VMs (5%) are running videos simultaneously which is much lower than what most deployments want to plan for especially when you consider the possibilities of “video storms” that can drive up the number of simultaneous video users (e.g. a major news event driving more people than normal to check things out on line).

    The second issue with the Medium Multimedia workload is that it includes a continuous audio application. Obviously, this is an application that uses a lot of CPU without generating any pixels changes, which reduces the percentage benefit that an APEX can deliver. Furthermore, the audio application continues running even during the video which is an unrealistic scenario since the two audio streams get mingled. These are the reasons we believe that a mix of “video only” VMs and Medium without Multimedia VMs is the better way to size the steady state capabilities of a VDI server using an APEX.

    One other caution about the LoginVSI workload is that the video is being decoded using Windows Media Player. It turns out that WMP is not as stressful as other “high pixel changing” apps. Simple things like scrolling through a PDF or using a different media player with the same video can generate even more pixel changes per CPU load making the percentage APEX benefit even higher. WMP is a convenient thing to use, but is actually not the most common way that video is consumed in everyday use. Most video is decoded within a web page. Most modern browsers have video decoders that are more efficient that WMP, so this is yet another way that LoginVSI understates the advantage of APEX (and overstates the CPU load of video decoding).

    Finally, for simplicity, this discussion has only focused on a single VDI server. In reality, most VDI deployments involve 1000’s of VMs spread over 100’s of servers. How does this affect the sizing considerations? Let’s take the case of a 5000 user deployment over 100 servers (to make the math easy). Let’s say you want to plan for 10% simultaneous “peak” users. That means 500 “peak” VMs. However, these will not be evenly split with exactly 5 “peak users” on each server. Some will have fewer than 5 and some will have more. In fact, if you calculate the binomial distribution, a 3 sigma server would have more than 10 users. This 2X factor turns out to be a good rule of thumb without having to resort to the detailed statistics. This means that if you want to size your environment for 10% “peak users”, you have to assume that the worst case server might have 20% of them. This means you should really run the steady state LoginVSI test with 20% video users and 80% Medium workload users to size your servers for 10% overall “peak users”. The APEX benefits become even bigger in this case. So, I would recommend that you use LoginVSI in steady state with 20% peak users emulated by a video-only workload using a more efficient media player like VLC and 80% Medium workload users and see how many VMs can be supported at 80% CPU utilization with and without APEX. Then, the benefit will be obvious.

    (BTW, we have given similar feedback to the good folks at LoginVSI and they are looking into modifications to their suite to better support evaluations of systems deployed with APEX. Since something like APEX has never existed before, they admit they did not take it into consideration in the design of their workloads.)

    While this is a nice, technical analysis of why we find so many customers plan to have APEX in all of their VDI servers from day one, it really is a reflection of the more fundamental principle behind successful VDI deployments which is sizing the system to properly handling peak workloads, not just average workloads. Why do so many VDI deployments use SSDs within their storage hierarchy? The average IOPs don’t really justify the expense, but the peak of a boot or login storm does. Likewise, why has every PC in the past 25 years had a GPU? If you are not a workstation or gaming user, the average CPU utilization to do pixel rendering doesn’t justify the expense, but the peaks of opening a new window or paging through a document makes a visible difference to the user experience none the same. Admittedly, there are VDI deployments where the users are only authorized to use a limited set of applications that are primarily text-based (e.g. some call centers) for which APEX will be of limited benefit, but that does not represent the general case where VDI is being deployed today. In fact, most of these “low pixel changing” deployments are still on Terminal Services and not VDI anyway and probably will remain there.

    APEX has only been on the market for about 6 months and is just coming out of a lot of proof-of-concept deployments. The result is that it is being spec’d into the full VDI deployments associated with these PoCs. Clearly, end customers are finding the benefits of having peak offload capacity in their systems well worth the price of APEX. Also, I can’t wait until we disclose all of the new user experience benefits we’ll be announcing shortly through a software upgrade to the existing cards. Stop by our booth at VMworld in a few weeks to see it yourself.

  • sam brown says:

    Please re-do this with comparison to remotefx with physical gpu and soft-gpu. Also can you compare RDP 8 with both TCP/UDP over (lan/wan) ? It has many advantages and tuning points.

  • Dominic St Pierre says:

    I agree with Sam Brown ^^^ – can you offer a part 3 and compare remote FX with physical GPU and soft gpu and include new RDP etc. that would be great to see.

    btw, excellent article. helps sift through some of the marketing hype.