VMWare Server + Win2k3 64 bit + Linux NFS = Not Fun
I haven't written a tech blog post in a while, but I've been working on an interesting, albeit frustrating, problem over the last few days.
At work I have 12 Dell PowerEdge 1950 servers, each with dual quad core Xeons (ranging from 1.8ghz to 2.3ghz), 16GB of RAM, and 138GB SAS drives. They're running VMWare Server 1.0.3 on CentOS 4.4, with all of the latest OS level updates installed.
We're virtualizing about 120+ Red Hat Enterprise Linux 4 U2, U4, Windows 2000, and Windows 2003 Server nodes, both 32 and 64 bit. These nodes would be running my company's software, Oracle, and MS SQL Server.
The bulk of those VM's live on a Dell Poweredge 2900 server with 8 x 500GB SATA drives, and a Dell PERC 5/i RAID controller in a RAID 5 config. The CPU is a quad core 1.8ghz Xeon. It has 2GB of RAM. The server is running CentOS 5 and is sharing it's disks with NFS v3. There's a 2GB bonded ethernet connection using the onboard Broadcom nic's and a Dell Powerconnect 5324 switch.
We were seeing that Windows 2003 64 bit nodes, when under moderate to heavy load, would experience massive packet loss. Additionally, the VMWare Server Client would not redraw the servers screens reliably. Finally, the node would bluescreen with a KERNEL_DATA_INPAGE_ERROR. This would happen when our software was copying SQL Server media to the node in preparation to provision a database. This would only happen with 64 bit Windows - 32/64 bit Linux would be fine, and 32 bit Windows would be fine.
The Windows Event Log would be littered with warnings and errors about "The device, \Device\Scsi\symmpi1, is not ready for access yet." It didn't take a rocket scientist to figure out that something was happening to make these machines try to access swap, fail, and bluescreen.
Now, I had been told by users that this was happening on nodes that were on local disk as well as our remote NFS server. I did extensive testing and was not able to reproduce the problem when the nodes were on local disk. It turns out that I was given erroneous information, and that nodes that people thought were local were in fact on NFS. Once I moved my test nodes over to NFS, I could reproduce the problem.
VMWare has a KB article that addresses this issue. In fact, it seems fairly common for people who run their VM's over an iSCSI SAN. Once I applied the registry change, my VM's stopped bluescreening, but our file copy operation would still fail.
Looking on the VMWare Server, you would see load averages of ~20-30, and iowait's around 25%. Looking at the NFS box, you could see that i/o to /dev/sda2 was eating up about 100% of CPU.
I changed our NFS mount options. No dice. I turned on Jumbo Frames on the bridged nic on my test VMware server. No dice. Each step would make things a "little" better, but not solve the problems.
Then, I moved the VM images over to our Netapp, which was no small feat since most of the space is used. I finally freed up about 120gb, enough for my 5 test VM's and their snapshots, and went to testing. I fired the VM's back up ran through another provisioning event.
Not only did my packet loss issues seem to go away, but for once I was able to run a Windows 2003 64 bit node on NFS and provision MS SQL instances without bluescreening.
Our Netapp isn't the newest model. It's a FAS 270 with 1.2tb of space. It's connected to another Dell switch in another rack, with a 1GB uplink to my core switches. The Netapp does not even have Jumbo Frames enabled. Somehow, though, it's kicking the crap out of my Dell NFS box, despite being seemingly "inferior."
My questions at the moment are:
- Is my config on this NFS box fundamentally broken somehow?
- Is Linux's NFS server really bad? Would I be better off with BSD or Solaris?
- Is something up with the driver for the PERC/5i? Is write caching enabled?
- Is there something up with the LSI driver in Win64 that does not show up in Win32 or in Linux?
- If I have to rebuild this NFS box, where do I put 1TB worth of VMWare images while I rebuild the box?
Solaris x86 on a Dell Poweredge 2900
We got a Dell PowerEdge 2900 in with the intention of making it a big ass file server. The basic specs on it are:
- Quad Core Xeon 1.6 (Dell was running a special, free upgrade to the quad core from the dual core)
- 2GB RAM - PC2 5300, 4 x 512mb
- 8 500gb 7200 RPM SATA drives
- Dell PERC 5/i SATA RAID
- 5U Rack chassis
My intent was to install Solaris x86 on the box, setup a ZFS partition, install NFS and Samba and make a nice file server to hold VMWare images and a file dump for the developers. Unfortunately I found out that there are no drivers for the RAID controller for Solaris x86 from Dell, Sun, or LSI.
So, I loaded CentOS 5 (CentOS 4.4 won't boot on it for some reason - hangs before Grub tries to run), and installed VMWare Server. I'm going to install Solaris x86 under a VM and give it access to a raw partition to hold it's data. This should keep things speedy. I did read, however, that Solaris x86 will core dump VMWare Server 1.03 if it tries to access a raw partition. Hopefully that won't be the case for me.
I also need to get Dell OpenManage installed on all of these servers so I can monitor their health and get alerts if they lose a drive in their RAID arrays.
I also need to get the storage network up and running. For now it's going to be on it's own VLAN. If I have my druthers, though, it will be on a physically separate switch. All of the new VMWare servers I bought have a 3rd TOE nic that I was going to use just for accessing the NFS server that will host the VM's images. The PE2900 will probably end up having 2 of it's interfaces bonded to get 2GB/s access to the LAN.
It's never ending. At least I got to leave before 7 tonight. Still didn't get home until 9:15 or so.
VirtualCenter 1.4: VMWare’s Redheaded Stepchild?
We're pretty heavy VMWare consumers at work. Each developer has a local copy of VMWare on their desktop, and we have a decent number of VMWare Servers deployed as well. It was only natural for us to want to get some form of central management system to handle the administration and performance monitoring of our virtual infrastructure. We recently bought five Dell PowerEdge 1950's, with dual quad core Xeons, 15GB RAM, and a 130GB 15K SAS drive. One of those boxes is loaded with CentOS 5 64 bit, the other four are loaded with CentOS 4.4 64 bit. The CentOS 5 box will most likely be reprovisioned to CentOS 4.4 for uniformity.
VirtualCenter 1.4 is the old version of VC. It was used to manage ESX 2.x and GSX servers. They've released VirtualCenter 2.0 which manages ESX only, and have relegated the old 1.4 codebase to manage legacy - and currently shipping VMWare Server - products. It's a decent enough product, but it has some real quirks.
I have to make 20+ identical virtual machines. VirtualCenter has a templating option which will allow me to create a template and clone it off when I want to deploy a new VM. "Perfect," I think to myself. Our standard Linux agent build has some customizing.
- CPU: 1 virtual CPU
- Memory: 1024 MB
- Disk: 6GB SCSI. Virtual disk should be preallocated, and chopped up into 2GB files.
- Ethernet: Two ethernet cards, each bridged to a different physical NIC in the box - this is needed for Oracle RAC
- No Floppy
So, I go ahead and make a template. It saves it up on the server thats running VirtualCenter. Oddly enough, the template only takes 1MB worth of space. I figure that since I have not provisioned a OS on the template yet, the software is smart enough to not keep 6GB worth of empty virtual disk files around, and when you deploy it on a VMWare server, it will just create the 6GB file then.
Not the case.
When you deploy the VM from the template, you're prompted if you want to change the memory size, which NIC's you want to bridge, and the floppy returns. The kicker -- the virtual disk is created, but not preallocated. When loaded with the OS build we use for the agent, the VM only ends up taking up about 1.2GB. Which is pretty bad when you go back to the status screen for the physical VMWare Server and see that you still have 40GB free on it's disk. So what can happen is that your VM's can grow and grow unchecked, until they all hit the boundary that was set for them (in this case 6GB) -or- the physical server runs out of disk space.
I'm also not a huge fan of the way that the software presents data. It takes too many clicks to navigate to see what VM's are hosted where. Errors are reported by icons that show up next to the VM's name, and buried in logs, instead of right on the status page for the VM.
It's also pretty braindead about copying VM's around. VC will do cold migrations of VM's between managed servers. It does this by copying the file between the two servers. What happens if they both share the same storage on NFS? The system should be smart enough to see that and not copy the file, right? Nope. The file gets transfered from the NFS share, through the old VMWare Server, to the new one, and back up to the NFS share. It also appends an underscore on the name of the VM since the file name is the same. It can take a really long time to copy VM's around.
It seems to me that they just hacked VC 1.4 support for VMWare Server, and that they would have been far better off adding support into VirtualCenter 2.0, and adding some logic to make things like templating and cold migrations of VM's work better.
So, I battled VMWare today. As a reward, I went body surfing after work. I often think how cool it is for all of my friends that live and work in Manhattan. Then I remember that I live 5 minutes from a clean beach that's safe to swim at. It's not so bad here on the shore.