I first became aware of this about 7 months ago. When our Hyper-V hosts started crashing.
Omar Droubi - Posted on Tuesday, October 21, 2008 9:46:03 AM
After much communication with PSS they have acknowledged that the hyper-v vss writer issues is due to a known Windows Bug
This bug was first discovered in Windows 7 and has now been reported in Windows Server 2008.
Now the good news.
The fix for this bug will be incorporated in Windows Server 2008 Service Pack 2 (ETA currently unknown). If they get a fix out sooner I will try to share as best I can- if they happen to released it as a separate hotfix and they notify me about it- since I am now on the list of affected clients/partners who have open cases.
For the mean time I have implemented the work around of simply disabling the VSS backup integration on each of my VMs. This has proved to work well enough for me since I backup at the physical server level using Windows backup.
My guest machines go into saved state and start back up with out issue- their downtime is only about 2 to 4 minutes total per physical host and me and my client find this acceptable as the backup runs off hours.
(I am running Win2k8 VMs only-plus a single W2k3 VM)- within each VM I have assigned its own dedicated disk for backup using Windows Server backup (NTBACKUP on W2k3) in case I need to restore at the VM level. and if the parent partition goes corrupt I can recover the entire server using the Windows backup I run to external USB drives.
Poor mans backup- but in this economy- for this particular client- 5 physical servers-plus 10 hyper-V guests- this solution works very well. (backup to external USB drives that are swapped a few times per week.
Does anyone know of issues or unsupported scenarios where putting a machine in a saved state is risky or not recommended?
That forum thread is a good read, and there's lots of hope in there. For example, when we first ran into the issue it was on our own Hyper-V server. So our first reaction was to change freeze all of the Hyper-V hosts that we managed. This actually worked for a while - until a stupid blunder with our update management system and some 2008 updates "leaked". Then those machines started crashing too. So, our current workaround, and we're still using it, is to disable the "Backup Integration Service" on all of our guest VMs and scheduling the backup to occur at night.
What this does is save the state of the VM briefly while the VSS snapshot is taken. Then the VM wakes up and the backup of the volume continues. It works really well, but its actually a step backwards for us, since we've been doing live VM VSS backups since the Virtual Server 2005 days.
The thread goes on to mention things like issues in Hyper-V VMs that have SCSI disks/devices, etc. I don't doubt that those are issues, just not for us. The very last post of that thread sounds extremely promising: ensuring that all of the volumes that have Hyper-V VHDs on them have dedicated shadowstorage. Unfortunately my own testing has not revealed that to completely fix the problem. At best it reduces the crashing from "every backup" to "every other backup".
So, like Omar, I guess we're still waiting for Service Pack 2, or whatever. I think Hyper-V is a wonderful thing, but I sure wish they'd fix this.
0. Take a Hyper-V snapshot of your Kaseya server.
1. Get a copy of SQL Server Standard.
2. Launch the setup using "setup.exe SKUUPGRADE=1"
3. When you get to the instance selection screen, click the "Installed instances" button to find your existing SQL Express instance and select it.
4. Continue through the wizard, your SQL Express instance will be updated to full-blown SQL 2005.
5. Restart the Kaseya Service. All of your agents may currently show offline. Don't panic, get a coffee, have a donut, come back in a few.
6. Oh look, we're back. Enjoy the rest of your weekend.
Now, I can't take full credit for this, but I'll try.
First of all, invest in Jeff Middleton's Swing Migration documentation. This isn't the part that helps you do it from the comfort of your home or office, but its just great information, organized into a fashion that is extremely usable, in whole or in part.
Second of all, virtualize things. Like, really. This is the real enabler.
Some of you are probably laughing right now.
Question: Why on earth would you want to do server migrations NOT on site?
Answer: Are you productive sitting in the customer's "server" room? Crammed in there on a stool? Keyboard perched on top of a book? Or maybe you've wrangled the use of someone's office, so that's a step up. Maybe you've even brought your own laptop to work from, and that's probably the ideal on-site scenario; you can work on a familiar machine. I don't know about you, but I much prefer either my office setup (laptop + external monitor), or my home office with laptop + dual monitor workstation. You can effectively multi-task when are you comfortable and have the right tools.
Scenario: Customer running some sort of Windows server. Maybe its an SBS, maybe it isn't. Maybe they use Exchange, maybe they don't. It doesn't matter. With the proper planning (and what project doesn't require planning, so it might as well be proper), you can set yourself up for a cakewalk.
STEP 1: P2V migration of the customer's existing box. Do not pass go, do not collect $200. Don't skip this step. Obviously this part does require a site visit. And I'm not gonna lie - this is the hardest part - and I say that to put the rest of the work into perspective. But there's so many good tools out there, you should be able to perform a P2V into the virtualization software of your choice in under half a day. This will be the LONGEST outage of the entire project, and you can complete it in an afternoon. Perfect time to get on to new hardware, but it doesn't have to be. At the end of the day you've virtualized the customer's server, but its sitting there on the network as if nothing has happened. But this critical first step is required for the magic to really happen.
When you are done STEP 1 and you have a virtualized version of their server, it gives you so many outs its not even funny - VM snapshots in case you have an "oh shit" moment. Or you can shut the damn thing down, copy the VM files, and bring up an entirely offline copy of their server and test your procedure. Completely risk free if you let yourself have the discipline to make use of the tools.
STEP 2: Swing baby swing. I can't get into the nitty gritty details here or I'll break Blogger and piss of Jeff Middleton. Build up some more VMs and perform a swing migration to a new, virtualized server. It's dead easy, and if you screw up, the swing migration methodology represents almost zero risk to the existing infrastructure, and the you can mitigate the rest using virtualization tools (backing up .vhd or .vmdk files, or taking advantage of snapshots). Swing + virtualization represents a COMPLETELY zero risk server migration.
STEP 3: In the finishing steps of the migration, you run into the challenge of perhaps not being able to have both the old and new server online at the same time. Since the Swing Migration is designed for minimal network disruption, and zero touch on the workstations, this is definitely the case. But even if you aren't doing a swing, maybe you are keeping the domain and/or server name the same on the replacement server, this becomes an issue for you. Besides, you don't want to unveil your beautiful new server until its really done.
If your host server is Windows Server 2003 + Virtual Server 2005, enter good old ICS! Hopefully you have two NICs, and if you don't maybe plan to install one while you are back in STEP 1 (you read through this entire post -before- starting, right?). Enable ICS on the primary NIC, and then the secondary NIC automatically becomes "LAN". You may need to plug it in just to ensure its up (or use one of those loopback connectors), but really all we are going to do is put a virtualized network interface on top of it. Really we're just working around a limitation of the Virtual Server 2005 networking in that it doesn't have a built-in software "router". There might be other ways to do this, but this is the least invasive to the SBS (or other server) you are trying to migrate to. Attach a second NIC to your destination VM, and you've got a nice little virtual LAN going and your target server can now access the Internet. You'll want to disable the DHCP functionality of the ICS NIC as well, unless your new server doesn't run DHCP or something.
If your host is Hyper-V (Server 2008), its even easier because you can do all this with the built-in Hyper-V networking features. Either way, you can get BOTH source and destination VMs online, without seeing each other, and you can complete the server migration without having to TOUCH anything physically.
STEP 4: Complete the transition to the new server. With this method its possible to transition data with permissions still intact. You can forklift Exchange and SQL databases, reconfigure shares, printers, everything you want to to. The finishing touches of course don't happen until the new VM is LIVE on the customer network. But you can do this over a lunch, or a few hours during the snoozy part of the afternoon, it all depends on your customer's business. I have NEVER done a weekend server migration, and I don't intend to. I once completed a swing on a Wednesday evening for a 30 seat network, and showed up on site the next morning to handle like three issues. This stuff is a no brainer and if you aren't doing it, you *$%^ing should be.
STEP 5: Mix yourself a drink and marvel at how clever you are. I prefer vodka/coke with copious amounts of ice to chew on. If you're out of vodka (or ice) a decent beer may be in order. We like Sleeman's Honey Brown in these parts, but those are all just suggestions. The point here is to take full advantage of you doing the necessary front-end loading to a) take the risk away, b) take the pressure off, c) enjoy more time NOT stuck on customer sites.
You may notice that the above doesn't take any rocket science at all (I double checked - not a single rocket was harmed in the making of this post). It just requires careful planning, and a working Internet connection. The fact of the matter is that IT people can do 90% of their job from a remote location. The rest of the visits can either by dumped to an onsite tech, or maybe you show your pallid face once in a while and say "hi" to real people (kidding). Still on dial-up? Then I can't help you.
What the hell is Dell thinking including this bullshit on their machines? Apparently this started towards the end of 2008, and it causes all kinds of fun problems with Windows server's offline files functionality.
I particularly like this blog post: http://mikedimmick.blogspot.com/2007/12/calling-out-embassy-trust-suite.html. Mike correctly aims his post at Wave Systems (who I refuse to link for you). I still blame Dell for picking them.
Remove this software on sight.
And then Virtual Server decides not to see it in the admin console. This has annoyed me in the past, and normally when this happens we shutdown (or save) the VM(s) and reinstall VS2005 (*cough* downtime). I decided to try and find out if the situation has changed. It really hasn't.
My research pulled up the following:
1. The reinstall:
2. The hopeful workaround that doesn't:
3. The (only slightly) better solution than reinstall:
Only option 1) worked in this situation. I mean, if you're quick you can do the reinstall in a few minutes flat and have your VMs up and running before you know it. But still. I just wish that Microsoft would release one more update or service pack for VS2005 to deal with these annoying little issues. In the meantime, we're just continuing the push towards Hyper-V.
Name: #VM NAME# (The friendly name of your virtual machine)
Next time you run your backup utility or script, without explicitly telling the VM, it will save its state just long enough for the VSS snapshot to occur, then wake up the VM for you on its own.