SQL VM CPU Spikes

Some people still aren't convinced by virtualisation and while it's true that there are some situations that it's not especially suited for they are relatively few in my experience. I know a few people who are yet to be convinced completely. One's a SQL DBA and there are times when she has a point. I thought that this might be one of them until I started poking around.

screenshot_2009-12-04_18-25-21Initially I was asked about what was causing a SQL VM to respond slowly and use 100% CPU. I had a look in vCenter and while it looked slightly busy it didn't seem over worked. As the graph to the right shows, it was using only about half of the available 3Ghz CPU it had access to. Perhaps I should explain further at this point that my client's practice when it comes to VMs is to provision them with a single vCPU and add more if they are required. It seems that if this was normal load for the VM that 1 vCPU should be enough.

Looking back through the VM's performance history I could see nothing particularly wrong either. Occasional CPU spikes in the past possibly indicating reboots or overnight processing. Oddly though there were random plateaus of activity for several hours at a time. Mostly overnight the VM would idle along using practically no CPU resource but during the day there were long periods where it looked a lot like the activity above. Time to look at the guest OS.

The picture inside Windows is slightly different. Opening up task manager shows frequent bursts of 100% CPU usage (see below). Actually you could call them regular. And, more worringly, it transpires that the server is not yet in production – it's still being configured.

screenshot_2009-12-04_18-18-30

The offending process is services.exe so it's not immediately obvious what the issue is. Purely by coincidence I asked the DBA if she could log off for a while so that I could look into what was going on. When she did, the strangest thing happenned:

screenshot_2009-12-04_18-27-18

See how the CPU usage dropped back down to idle and stayed there. That begged the question “What were you running?”.

It turns out that the culprit was none other than SQL Management Studio. When opened and connected it polls the server's status every 10 seconds. Strangely though, instead of polling just the SQL services it polls all services on the server (this can be seen using Process Monitor) which seems a bit excessive to me. Due to the way that hypervisors share resources, what would be a small blip on a physical host is magnified within the VM somewhat. Microsoft have acknowledged that this happens but to my knowledge haven't done much about it. There is a registry key that can be modified to adjust Management Studio's behaviour. For SQL 2005 SP1 onwards (it's not available before that) it is:

HKLMSoftwareMicrosoftMicrosoft SQL Server90ToolsShellPollingInterval

Setting it to 600 will reduce the frequency of polls to once a minute. Alternatively, just don't leave SQL Management Studio open longer than you have to and wait for Microsoft to fix it.