Friday, March 13, 2009

VMware ESXi on SunFire x4140 x4240 x4440

A work posting again because everybody always asks what I do for a living.

I work in x64 engineering at Sun Microsystems. I ensure that vendor software from VMware works well on the various AMD and Intel x64 systems we ship. The process works something like this.
    * Product Team specifies hardware and software requirements.

    * Hardware Team designs the hardware.

    * Software Team (that's me) engages OS vendors (e.g. VMware, Redhat, Novell, Microsoft and Sun) and says, "In six months we'll release the SuperConstellationMegaPlus using the unreleased 64 core Unobtanium HyperQuickConnect CPU with support for 256 sockets and maximum 32TB RAM and we already know your OS breaks with that high CPU count and memory size; InterHub MCP100 bridge on each socket so don't forget to fix your multiroot PCI Express support; SAS 2 with zoning and up to 1024 discrete solid state disks; and the usual peripheral support and high speed Infiniband, 10Gbps networking, etc. with hot plug required on everything, including the processors and RAM."

    * 3 months later, OS vendors toss their pre-release builds to me.

    * I toss the builds to our Software QA.

    * Software QA finds bugs. It's my job to work with the OS vendor and find the cause of those bugs and ask our OS vendors to fix them.
Here's an example:

Early on, we discovered the VMware's ESXi 'thin' hypervisor would not install on the SunFire x4140, x4240 and x4440 servers. These machines, codenamed "Dorado Tucana" (DTa for short), are essentially identical and share the same motherboard.

Previously, VMware always booted a modified Redhat distribution to install ESX. The ESXi install process differs from ESX "Classic" in that it uses itself as the installer. When you boot the ESXi install CD, you are booting ESXi.

I initially thought there was a bug in the HBA storage adapter because the install program always locked up at "Loading aacraid...", which is the software to control the Adaptec storage controller we use in our test machines. Debug by process of elimination: I removed the Adaptec controller.

So now the machine hangs somewhere else. Hmmm....

But now I'm able to see messages like "Keyboard controller buffer overflow...." And our nifty hardware debug tool shows me that the program is stuck in a very small loop that looks something like this:
    while (inb(0x64 & 0x01)) { call somefunction() }

I/O port 0x64 is the old legacy 8042 keyboard controller, except DTa does not have an 8042 or even a SuperIO chip! When I was reviewing the DTa hardware design way back in 2007, I even made a notation to our product team that this was our first platform without a legacy keyboard controller of any kind and we may encounter some OS bugs.

All modern PCs emulate the old 8042 keyboard controller first used in the IBM PC AT in 1984, because MS-DOS, the BIOS setup program, and the various option ROM setup programs all depend on the existence of a PC/AT keyboard even though your PC no longer even has a keyboard connector. The system BIOS can find your USB keyboard and make it pretend that it's an old legacy PS/2 keyboard for this old legacy software.

When your modern OS (such as Windows, Linux or ESXi) boots, it pokes the BIOS and USB controller and tells them to stop pretending to be an 8042 and start acting like a real USB controller -- this is called USB BIOS handoff. Almost every PC made, however, still has something that acts like the 8042 at I/O locations 0x64 and 0x60 somewhere on the motherboard -- when the pretending stops, the real 8042 I/O is still there. When the OS reads the keyboard status register at 0x64, though, the 8042 isn't connected to a keyboard, so it always reports the keyboard buffer is empty with a value of "0".

As I mentioned previously, DTa does not have an 8042 of any kind. As soon as the OS takes over the USB operations and tells BIOS and the USB controller to stop pretending to be an 8042, there's no longer anything at I/O locations 0x64 and 0x60. And when the CPU reads an invalid I/O location, the returned value is always "-1." This means every bit of what the keyboard driver thinks is a status register is set. The keyboard driver thinks the keyboard buffer is full, reads the keyboard data register at 0x60 (which also returns -1 or 0xff), and tests the keyboard status again, which will be 0xff again. Rinse and repeat until done, except, of course, it never is done because inb(0x64) always returns -1.

I proved this by dissecting the guts of ESXi and removing the OHCI and UHCI USB drivers (which forces this handoff behavior and keeps the BIOS and USB driver in legacy keyboard mode). When I remove those software bits, the problem goes away. I reported this to VMware so they could make the necessary changes.

There are a couple of fixes to this problem. Linux counts the number of "-1" values it reads and if it decides the number is unreasonable, it decides there's no 8042. The engineers at VMware got a little more clever for the fix and they look at the ACPI DSDT -- the Differentiated System Description Table. This is a data structure in BIOS that lists the component hardware. If an 8042 is not listed in this table, ESXi knows not to load the keyboard controller device driver.

For those waiting to install ESXi on the x4140, x4240, and x440 (and many people have asked): This fixed version of ESXi is not yet released, though it should be available Real Soon Now and we're already certifying that new version of ESXi for those servers.

15 comments:

  1. Uh, all I hear is "white noise". Sounds pretty involved, though, even though I didn't understand a lick of that!

    ReplyDelete
  2. All I'm hearing is, "I'm Richard, I work for Sun Microsystems but for some reason I still insist on forwarding my domain to blogger rather than getting a virtual host and installing Wordpress."

    ReplyDelete
  3. There's no forwarding going on at all, Kit.

    ReplyDelete
  4. When is the new ESXi coming out?
    Also the latest ESX 3.5 has a very difficult time booting also on the X4140. Only successfully installs in text mode. I have 3 Sun X4140 that need to be installed and shipping by March 20th, so can you send me your patch or work around so I can have these servers installed.

    ReplyDelete
  5. Emerich:

    (1) I'm not allowed to say when VMware's next updates will be released.

    (2) ESXi installable is not (yet) certified for x4140, which means neither VMware or Sun support this configuration right now.

    (3) ESX 3.5 "Classic" is certified and supported. Have you contact Sun or VMware support? What kind of problems do you see? What option cards, memory, CPUs and storage?

    ReplyDelete
  6. Praises to you for debugging! We were waiting our asses off for ESXi to run on the X4240.

    ReplyDelete
  7. Yokota, when is Sun going to compile HERD for the ESX service console? I have two X4440, my colleague has two, and yet another has two.

    Also, is there a fix for:
    a) IPMI checksum errors,
    b) IPMI claiming that the power supplies are asserting/deasserting "ok" all the time, or
    c) cimserver (pegasus) soaking up CPU0?

    You might also let the tech writers know that slots 2,4,5 work just fine and don't disable the onboard NICs 2&3, if ACPI is on. And ACPI is on by default.

    Regards,
    Nathan dot hudson hyphen crim at milliman dot com

    ReplyDelete
  8. We were supplied a X4140 from a "solutions provider" as an ESXi platform. After first trying to install ESXi U3 unsuccessfully I came across this post. As it seemed the issue would soon be fixed we held off on returning the machine. ESXi U4 is now out but still no X4x40 support, I'm curious to know what happened? Obviously our X4140 is on it's way back to the supplier, a shame as it is a nice machine.

    ReplyDelete
  9. Any word on X4140 certification for ESX 4? I was about to purchase some 4140's for VMware, and then ESX 4 came out...and is not on the HCL. :(

    ReplyDelete
  10. Casey, certification for ESX 4 on x4140, x4240 and x4440 is in process. I did initial ESX & ESXi 4 bringup and don't expect problems (including on the new Instabul processor), but I'm still supposed to *ahem* manage expectations and not pre-announce any release dates. Sorry. You can install ESX and ESXi 4.0 on the x4x40 family and it will work, especially if you stick to vanilla configurations, but you won't get any support on it from either Sun or VMware.

    Several Sun x64 systems were on the HCL at vSphere launch, but we didn't have the manpower to get everything certified. Thanks for your interest.

    ReplyDelete
  11. THANK YOU!!!!

    I've tryed to install vmware exsi on our x4140 and now I now why it fails. The sun support told me that the problem was that quad proc was not certified. Now I know which was the problem and I feel better.

    Gabriele.

    ReplyDelete
  12. Since people are still asking: ESX & ESXi 3.5 Update 4 are certified on the x4140, 4240, and 4440 for all released CPU configurations except Istanbul. Istanbul support will come with the next software release that's due Real Soon Now.

    ReplyDelete
  13. GREAT!!! Somebody is actually saying the WHY behind an error, what will be done, and what has been done! VERY instructional!!!!

    ReplyDelete
  14. I have successfully installed esxi 4 on the sun x4240 without any errors and its works perfectly fine. The issue may have been an old one and may have been resolved by now

    ReplyDelete
  15. @Anon 7:51 - Yeah, we fixed it a while back.

    ReplyDelete