August 13, 2008 - Matthijs Haverink

Sure, VMWare made a (critical) mistake, but what’s all the fuzz about ?

Sure, you’ve read about the Update 2 troubles. VMWare made a huge mistake by releasing something with trial-code in it, and the impact can be huge when put into production but there’s what’s bugging me : the whole world is acting like the impact it has is catastrophic.

Virtualization.info started with the topic title “VMWare mistake shuts down thousands of Virtual Infrastructures”, later on nuanced by “may shut down”. They also state that 95% of the Fortune 500 customers use VMWare and over 50% uses VMotion. Yeah, sure they do, but do you really think that administrators working at a Fortune 500 company think : “hey wow a new patch; it’s from VMWare so it’ll be good; let’s put it into production right now!” ?

Really, lots of people are acting like there are hundreds of thousands of virtual servers around the world that have been down and we should all stop using VMWare and switch to the “beautiful” Hyper-V or XenServer solution but:

1.Every administrator with the tiniest brain doesn’t instantly put a patch into a production environment. If you have a big environment you’ll have a test-environment also. Before you can get a patch from download, through test to production, it will take you a couple of weeks. Let me state that this of course is a bug that in 99 of 100 test environments will slip through but it needs to be tested anyway. If you have a small environment you should have the sense to wait at least a couple of weeks, I prefer a month, to see if you see any horror stories like this are flying around the world.
2.The effect of the patch isn’t that servers are shut down. It is that servers cannot be powered-on or VMotioned anymore. In a production environment : how often do you power-down a (virtual) server?
3. The combination with patch Tuesday is said to be a bad combination but even when you are someone who directly applies Windows patches (see my thoughts about that at point 1.) and the server reboots automatically, this will still work because the machine isn’t powered down.
4. Like Microsoft never released a patch or a product that has brought loads of customers in trouble. And did we all switch to another OS?
5.Switching to Hyper-V or XenServer: we all know that both products cannot tip the feature set of VMWare. Sure, if you don’t need VMotion or HA capabilities and other nice VMWare features, maybe Hyper-V is fine for you. But the so called Fortune 500-like companies won’t make that switch now.
6. They released a patch within the promised 36 hours.

Don’t get me wrong; I don’t understand how VMWare can release such a patch. Their reputation won’t get any better by actions like these either but I think that very little companies have really experienced real trouble because of this update.

So my question remains: what’s all the fuzz about ???

Virtual Infrastructure Express Update / Hyper-V / Update 2 / VMware / xenserver /

Comments

  • jamie says:

    The “Fuzz” is that we as VI admins are now forced to NEVER be “bleeding-edge” again. In the past, ESX patches and/or upgrades have been basically rock-solid. Yes, it’s true, no patch should EVER go immediately into production, but like it or not, it happens, and especially more so now that ESX 3.5 has “matured” more over time. This bug has brought on a lot of changes, not only at vmware, but in every VI on the planet who knows of this incident. It’s almost akin to someone backdooring the code, in which case it would have been worse publicity-wise, but you get my point. it should never have happened. it’s the sloppiest software release story of all time on the planet as of yesterday..

    Overall Trust in VMWare’s hypervisor and release practices have been compromised, and frankly, this has shaken up the entire Virtualization community – Perhaps it will make VMWare think twice about including “experimental” code in production GA releases, but I doubt it since that was not the issue either…

    In any event, it only helps MS/Xen and the rest, as the timing could not have been worse given the current playing field.

    Vmware has lost another bleeding-edge shop, and I’m sure we are not alone. Now, bugs will be uncovered slower, MS will advance faster, and vmware will likely be in the dust in less than 10 years due to Hyper-V. … mark those words. MS can beat anyone – it’s been proven over and over , ie netscape, and the countless other competition the illegal monopolist has trampled on and squashed over the years.

    Jamie

    • Hey Jamie thanx for your reply!

      As you say: it should never happen that a patch goes into prodcution immediately so everybody who does (especially with this one, as it’s 90% new features and 10% bugfixes) can mainly blame themselves; it’s that black-and-white …

      I do agree to what you’re saying about VMWare now maybe loosing a part of their advantage on their time-to-market time since this will now rule the workfloor @VMWare : “this will never happen again”.

      And about the whole Microsoft and Citrix Xen story: I don’t see this shake up having that much effect. Do you really, after this one incident, start thinking: maybe it’s better to trust Microsoft for bringing out reliable patches?

      And sure, large chance that Microsoft will “rule the world” concerning the hypervisor because everybody swallows whatever half-complete product Microsoft ships. But I don’t think that’ll be relevent anymore in 10 years. In 10 years hypervisors are comodity/free like your BIOS and it doesn’t matter which one you’ve got. What matters who has the best software to manage all those different hypervisors. But hey, that’s a whole other story I’d love to go into sometime, but not now :).

  • Rich says:

    Matthijs,

    Great post! My thoughts related to patching and change control along with other opinions about how this bug happened at http://vmetc.com/2008/08/13/my-thoughts-on-the-reactions-to-the-esx-35-update-2-bug/

  • Rob Mokkink says:

    Every admin who installs a complety new release with so many new features straight into production, should be fired on the spot.

    As far for the competition, Hyper-V doesn’t even come close to the performance and features of VMware Vi3. And i am hoping to see the first patchround for Hyper-V, because you can’t VMotion vm’s and you are stuck with downtime.

    • Sven Huisman says:

      Good points Rob.
      As for VMotion VM’s of the host: that is not possible with the VMware bug either (unless you change the time of the ESX host).
      The good news is: VMware has a fix for this as well:

      “We are aware that you may be encountering the following challenges installing the express patches needed to correct the problem.
      • Internal change control procedures
      • No available server to VMotion running VM’s onto
      • Unable to schedule a maintenance window
      If you have one of the challenges listed above, please contact your support provider and indicate you need assistance with the U2 Alternative Install Process (U2 AIP). An alternative installation procedure is being finalized and will be provided on an as-needed basis.”

      • Rob Mokkink says:

        I installed U2 in the lab. Just disable HA and set the time. And vmotion will still word 😉