I first became aware of this about 7 months ago. When our Hyper-V hosts started crashing.
Omar Droubi - Posted on Tuesday, October 21, 2008 9:46:03 AM
After much communication with PSS they have acknowledged that the hyper-v vss writer issues is due to a known Windows Bug
This bug was first discovered in Windows 7 and has now been reported in Windows Server 2008.
Now the good news.
The fix for this bug will be incorporated in Windows Server 2008 Service Pack 2 (ETA currently unknown). If they get a fix out sooner I will try to share as best I can- if they happen to released it as a separate hotfix and they notify me about it- since I am now on the list of affected clients/partners who have open cases.
For the mean time I have implemented the work around of simply disabling the VSS backup integration on each of my VMs. This has proved to work well enough for me since I backup at the physical server level using Windows backup.
My guest machines go into saved state and start back up with out issue- their downtime is only about 2 to 4 minutes total per physical host and me and my client find this acceptable as the backup runs off hours.
(I am running Win2k8 VMs only-plus a single W2k3 VM)- within each VM I have assigned its own dedicated disk for backup using Windows Server backup (NTBACKUP on W2k3) in case I need to restore at the VM level. and if the parent partition goes corrupt I can recover the entire server using the Windows backup I run to external USB drives.
Poor mans backup- but in this economy- for this particular client- 5 physical servers-plus 10 hyper-V guests- this solution works very well. (backup to external USB drives that are swapped a few times per week.
Does anyone know of issues or unsupported scenarios where putting a machine in a saved state is risky or not recommended?
That forum thread is a good read, and there's lots of hope in there. For example, when we first ran into the issue it was on our own Hyper-V server. So our first reaction was to change freeze all of the Hyper-V hosts that we managed. This actually worked for a while - until a stupid blunder with our update management system and some 2008 updates "leaked". Then those machines started crashing too. So, our current workaround, and we're still using it, is to disable the "Backup Integration Service" on all of our guest VMs and scheduling the backup to occur at night.
What this does is save the state of the VM briefly while the VSS snapshot is taken. Then the VM wakes up and the backup of the volume continues. It works really well, but its actually a step backwards for us, since we've been doing live VM VSS backups since the Virtual Server 2005 days.
The thread goes on to mention things like issues in Hyper-V VMs that have SCSI disks/devices, etc. I don't doubt that those are issues, just not for us. The very last post of that thread sounds extremely promising: ensuring that all of the volumes that have Hyper-V VHDs on them have dedicated shadowstorage. Unfortunately my own testing has not revealed that to completely fix the problem. At best it reduces the crashing from "every backup" to "every other backup".
So, like Omar, I guess we're still waiting for Service Pack 2, or whatever. I think Hyper-V is a wonderful thing, but I sure wish they'd fix this.