VMware: Can’t run 64 bit guests? Read this!
Yesterday one of my former co-workers sent me an IM asking for help with VMware on a PowerEdge 1950. He had Googled for the problem and found my blog. I suppose I may be an "expert" at VMware on the PowerEdge 1950.
Anyway, he found that when he powered on a 64 bit guest, VMware would error out complaining that he couldn't run 64 bit guests on his host. Since the guest OS "sees" the same processor as the host machine, this shouldn't happen. Fortunately, it's a simple fix.
Go into the BIOS in the server and go to CPU Features. Make sure that VT (Virtualization Technology) is enabled. It comes disabled by default on the PowerEdge 1950. Reboot the box and your 64 bit guests should boot up happily.
New Record
I made a new personal record today. 24 new VMWare virtual machines made in one day.
21 agents, 3 managers. There's 7 new Windows nodes to build as well. I also have to set up all of the clustering goodness on the nodes.
This puts us closer to filling QA's request of 54 new servers (Linux, Windows, UNIX.)
It's days like these that I think that if I knew Perl better, I could write a wrapper using the VMWare API and make this whole process easier. It also makes me wonder why nobody has done anything like that yet.
CentOS 4.6 VMWare Server Kickstart
As promised, this is the kickstart that I'm using for my Dell PowerEdge 1950's. Setting up and configuring kickstart is beyond the scope of this article (but may be covered in a later one). You will probably want to tweak the partitioning setup to suit your own taste. We're ordering our servers with an 80GB boot/OS drive, and a 750GB drive just to hold VMWare Virtual Machines.
This kickstart will get you a minimal CentOS 4.6 install, with some useful tools, and all of the pre-reqs met for VMWare Server 1.0.4 as well as Dell OpenManage. I am sure it could be whittled down further, but disk is cheap and it's served me well so far.
install
network --device=eth0 --bootproto=dhcp
url --url http://fqdn.of.server.com/osprov/media/Linux/CentOS46-AMD64/
reboot
text
lang en_US.UTF-8
langsupport --default en_US.UTF-8 en_US.UTF-8
keyboard us
mouse none
skipx
rootpw --iscrypted <crypted password>
firewall --disabled
selinux --disabled
authconfig --enableshadow --enablemd5
timezone America/New_York
bootloader --location=mbr# Partitioning
# This sets the 80GB SATA boot drive to hold /boot, rootfs, and swap
# and sets the 750gb SATA drive to hold VMWare VM's at /var/lib/vmware
clearpart --all --initlabel
part /boot --size=128 --ondisk=sda
part / --size=1024 --grow --fstype=ext3 --ondisk=sda
part swap --recommended --ondisk=sda
part /var/lib/vmware --size=1024 --grow --fstype=ext3 --ondisk=sdb%packages --resolvedeps
kernel
e2fsprogs
ntp#VMWare Server Deps
perl
xinetd
gcc
make
kernel-devel
xorg-x11-libs.i386
zlib-devel
zlib-devel.i386
compat-db
compat-db.i386
compat-glibc
compat-glibc.i386
compat-glibc-headers
compat-libstdc++-33
compat-libstdc++-33.i386
compat-libstdc++-296.i386#Dell OpenManage Deps
audit-libs.i386
cracklib.i386
cracklib-dicts.i386
libxml2.i386
#glib2-2.4.7-1.i386
glib2.i386
#libselinux-1.19.1-7.4.i386
libselinux.i386
#ncurses-5.4-13.el4.i386
ncurses.i386
pam.i386%post
rpm -i http://fqdn.of.server.com/osprov/media/VMWare/VMware-server-1.0.4-56528.i386.rpm
wget http://fqdn.of.server.com/osprov/media/VMWare/VMware-mui-1.0.4-56528.tar.gz -O /tmp/VMware-mui-1.0.4-56528.tar.gz#Dell Yum Repository (OpenManage, etc.)
wget -q -O - http://linux.dell.com/repo/hardware/bootstrap.cgi | bash
wget -q -O - http://linux.dell.com/repo/software/bootstrap.cgi | bashntpdate pool.ntp.org
My Large VMWare Server Farm
It seems like many people come to this blog from Google searches about VMWare, CentOS, and OpenFiler. I figured it might be good to talk about my VMWare Server deployment at work, since it's something that I am fairly proud of.
I have fifteen Dell PowerEdge 1950 servers. They're 1U each, with dual quad-core Intel Xeon CPU's ranging from 1.8 to 2.2ghz. They each have 16GB of RAM. Ten of them have 143GB 15K 3.5" SAS drives, and 5 of them have 143GB 10K 2.5" SAS drives. The servers that have the 10K drives have a backplane that will allow you to plug in 4 drives. The servers with the 15K drives have backplanes that will allow you to only have 2 drives. Each server has two onboard Broadcom NIC's, a PCI-X Broadcom NIC, and a recently added dual port Intel e1000 NIC. I'll get into that in a second.
Each VMWare server runs CentOS 4.4 64 bit ServerCD edition. For those of you who don't know, CentOS is a 100% Red Hat Enterprise Linux binary compatible distribution. It's built from Red Hat sources and, due to the nature of the GPL, is able to be released by the CentOS group for those of us who want Red Hat Linux but don't want or need to pay for Red Hat support. I would argue, given my experiences with Red Hat support, that the support offerings of CentOS are superior.
I am a firm believer in keeping things as simple as possible. I have seen many other Linux sysadmins want to go crazy with the software they deploy and the hacks they roll into production, only to be bogged down in a morass of "one offs" or to leave behind a legacy of poorly documented systems that really need their original owner to run right. I don't like that, which is why I tend to stay on the straight and narrow. I keep my partitioning simple. I (generally) keep the packages I install restricted to the ones available through official CentOS channels. Some may consider this heresy, but if there is a RPM available for something, I'd rather install that than build from source. All of this leads to systems that "just work" and that can hum along and do their jobs with a minimum amount of fuss. Could I squeeze some extra performance out if I did a custom compiled kernel? Sure. Do I want to be troubleshooting VMWare at 3AM in the morning because something in that kernel broke virtual networking? No way.
On all but a few of our VMWare servers, we run VMWare Server 1.0.3. New servers that have just made it into production are getting 1.0.4, with a general upgrade planned in the somewhat near future. Not because we're seeing problems, but if we have to take boxes down to add new hardware (the Intel e1000 NICs that I am getting to in a second) we might as well upgrade VMWare while we're at it.
We chose VMWare Server for the price. You absolutely can not beat it for the price, which is free. We spoke with VMWare about getting VMWare ESX in, and even in it's most basic of forms, it would have been prohibitively expensive. Here at GA we're concerned about getting the most value for our money. By going with VMWare Server we lose the ability to have multiple snapshots per VM which would be nice, but is not a deal breaker. We also lose the central management, but you can make up for that by buying VMWare VirtualCenter 1.4, which we did. I'm not too happy with it, but it could be because it just doesn't scale well to the level that we're using it, or it could be set up better. Probably both.
Each VMWare server has three nics. Two onboard and one PCI-X. eth0 and eth1 are both bridged interfaces - eth0 handles all of the main traffic to each node, and also serves as the management interface to the VMWare server itself. eth1 handles Oracle priv traffic for RAC, and cluster heartbeats for Windows SQL Server clusters. eth2, the pci-x NIC, handles all of the storage traffic. Each VMWare server has a dedicated uplink on it's own VLAN to a Dell PowerEdge 2900 that is acting as a big NFS server.
We ran into a problem with the PowerEdge 1950's on-board NIC's. If you put them under any sort of load (which we were with multiple VM's trying to copy media and provision databases on ASM) the bus that the NIC's were sitting on would reset. That would drop all of the VM's off the network for a time, and the switches that the nics were plugged into would show that the link had gone down and then back up. This is a bad thing. We're also not the first people to see it. After a fight with Dell (who were not really inclined to help us because of CentOS or VMWare Server) I got them to send us an Intel e1000 card. Installing this in the spare PCI-X slot made our network problems go away. So, we're in the midst of bringing down all of our VMWare servers, disabling the on-board NIC's, and installing these Intel cards.
Another problem we're running into is that Dell PowerEdge 2900. We have ~70 VM's on it, and when they get under heavy load some of the VM's experience SCSI resets, which sometimes results in database creates failing, and support tickets in our queue. According to some of the folks on the Linux-Poweredge mailing list, the hardware RAID controller that is in the box - the PERC5/i - generally sucks under Linux, offering performance slower than software RAID. There are rumors of an updated driver from Dell that will make it run faster -- we'll have to see how that pans out. In the mean time, we're going to be ordering fifteen 750GB SATA drives for each server. That will increase our total available VM storage to 11TB or so, which is better than the 2TB we get from the 2900. That also means that we lose out on nifty features like "if the VMWare server goes down, we can bring these VM's back up on another machine."
You may be curious how many VM's we can stuff on one of those 1950's. Well, with a mix of local and NFS storage, we've gotten up to 15 VM's running at once. These aren't weenie VM's either - they're either RHEL nodes which have either 512 or 1GB (usually 1GB) of RAM, 15GB of disk, or Windows nodes with 512-1GB of RAM, 15GB of disk, and clusters running. They're either running Oracle or MS SQL, and while they're not handling millions of transactions, they're being used by my development and QA staff.
As you might expect, power and cooling requirements for this bunch of servers is high. They're all in one APC Netshelter VX rack, fed by three 15A 110v AC lines. Some other infrastructure servers are also on those circuits, but we're using up roughly 30A in that one rack alone. Cooling is hard -- we've blown past what the 5 ton AC unit in the room can handle, and the two portable A/C units don't do much to help. We're in the process of moving gear to a colo.
All said, this environment has helped GA really expand. If we had to make an investment in physical servers we would have spent in excess of $500k to purchase all of that gear. With less than $70k invested, we're able to accomplish nearly the same thing -- and more, once we work the bugs out. I've been a huge fan of virtualization since VMWare first came on the market, and in my case it's really been worth it to deploy.