Knowledge Nuggets

ACC Yotta perform first ever Windows Update ‘transplant’ operation

by Jeff Rhys-Jones on 30 November 2016 15:06pm : 3383

Scalpel, Scissors, Swab… Errr… Windows Component Store?

A while ago a client of ours experienced a storage reboot on their SharePoint Server Farm VM cluster, the unfortunate result of which ended up with the Web Front End (WFE) / APP servers LUN disappearing. Completely. Fortunately however, we had recommended their SharePoint Farm (in addition to being backed up using specialist Sharepoint Backup software) be real time DR’d using Vision Solutions DoubleTake RecoverNow product (now called DoubleTake DR). Because MS SharePoint Farms are heavily integrated with MS AD and MS SQL – it’s not always possible to consider restoring just the WFE from a previous snapshot or backup, because everything is linked. So invariably, you have to take the whole farm back as one. So with a real time protection solution such as DoubleTake, as we had a 'live' replica image of the WFE/APP server a split second before the issue – we were able to recover the server right back to it's live, pre-failure state - and we’re back in business. The restore was a complete success. Or so we thought.

The Gruesome Recovery Discovery

Alas, a few weeks later, a rather peculiar problem cropped up which, at first, we didn’t tie up with the failure / restore. Windows Update refused to function on the recovered WFE/APP server. The last successfully installed update was a few days before the failure, and not a single update had worked after this. After a number of reboots / attempts to fix, we started to suspect that the problem might be something to do with the restore. So we repeated similar restores on three other test servers using RecoverNow, and, sure enough, on all three of these recovered servers - Windows Update was broken – with the dreaded Windows Update error 0x80073712. The likely culprit – a corrupt Windows Component Store.


Above: Windows Update error 0x80073712

So the problem was linked to DoubleTake RecoverNow after all, but why it occurred, when we had performed many many trouble free restores with RecoverNow, was (and remains to this day) a complete mystery. Perhaps it was the underling VM solution, Oracle VM, and it’s PV drivers – that’s the only aspect of the setup which was in any way ‘exotic’ and common to all servers we restored.

Help

The big worry was, who was going to own this problem, in order to help us find a resolution? Enter Vision Solutions. Even though the actual root cause was, about as clear as Marmite bisque – despite this, there was no hesitation to help us. Yes it was almost a certainty that it was something specific at ‘our end’ but still, they stood up and gave us full support at ‘their end’. That was immensely reassuring for us, as a Vision Solutions SPLA partner, to know we have this level of backing. I’m almost certain, many other vendors of backup/replication software would have simply walked away, or told us to call Microsoft directly. 

We had tried all the recommended actions. Windows Update Preparation Tool, System File Checker, downloading updates offline, whatever we tried, failed.

Double Vision - For Support!

Totally unappreciated at the start of our support case, in addition to Vision Solutions providing us with their own highly responsive core product support, there is also an extremely close support relationship with Microsoft. This relationship, I was to discover, was the single stand out factor in achieving the successful resolution of our problem. It's actually a critical advantage when you consider than in the most cases, we're not interested in blindly replicating raw data, but actually whole application platforms - like with SharePoint in our case. So you'd be wise to consider this fact when you're next in the market for an application replication solution!

After we’d tried everything, Vision Solutions hit the big button in their support department (the one with the big Windows logo on it) klaxons sounded, lights flashed, and in no time at all, we were on a Microsoft Support case. Unfortunately, the euphoria of being so quickly fast tracked to Microsoft, was short lived. The questions were asked quickly, and the answer came back just as quick. Unfortunately, just not the answer we were hoping for.  

The Microsoft Way: You need to be on the road first, in order to suggest an alternative route!

Going by the ‘good book of Bill’, if your server ever finds itself in the sorry situation of possessing a completely corrupted component store (as per error 0x80073712), officially, there is nothing that can be done: complete re-install. Seriously, Google it. This was the worst possible outcome for us, and would entail having to build a complete replica of the three server MS SharePoint farm and then using AvePoints DocAve to run a Farm Restore. This might be fine for the SharePoint files, however, the server platform itself had been quite heavily configured for certain advanced security / authentication requirements, and these would all need to be re-applied by the developers of the SharePoint App, once the restore had been completed. That equates to a load of late nights. Consider getting the sofa bed in the lounge made up.

The Lucky Recovery Discovery

The following weekend I was performing some systems tiding up, when I stumbled upon some VM image files of the SharePoint farm setup, which had been copied to archive, a couple of years earlier, I seem to remember, done for ‘safekeeping’ before a large code update. Being a couple of years old, these images pre-dated the LUN failure, so out of curiosity, I decided to fire up the WFE/APP server and see if Windows Update worked – and it did. 

Light Bulb Moment

The old archive WFE/APP VM had exactly the same physical configuration the current live server, just with a much older SharePoint configuration. So was it therefore possible that if I brought this older server up to the exact patch level of the last successful update on the broken live server, we could ‘harvest’ the WinSxS, Servicing, and registry hives from this ‘donor’ server, and ‘transplant’ them to the live, broken one? Could a ‘Windows Update Transplant’ possibly work? Time to chuck the book of Bill out the Windows! 

Roy, my contact at Microsoft support confirmed that this technique was indeed rather ‘interesting’ (polite for crazy perhaps even downright silly) and as he expected, after searching around the MS support / KB systems, this was a ‘stunt’ that had not been pulled before, at least not known to Microsoft. That said, it was also ‘interesting’ enough an idea for it to perhaps work – so big credit to him - the transplant was ON!

So the ‘donor’ VM patient was ‘prepared’ for the operation. We carefully updated it right up to the last successful update of the live VM with the broken update. We then exported the registry hives for updates, component store, WinSxS so we could attempt to successfully graft these on to the recipient. 

How we did it

 The exact process we then followed was this: 

  1. Wash down, don scrubs, find a pair of glasses with little telescopes on (optional)
  2. Shutdown the live VM with the broken Windows Update, clone* it to keep a backup
  3. Mount the boot disk of the live VM, to the ‘donor’ VM and power that up
  4. Once powered up, the disk from the live VM was shown as offline, so we brought this online
  5. Take ownership and set permissions on the folders you are about to copy files to as local administrator
  6. Copy the C:\Windows\WinSxS and C:\Windows\Servicing folders from the donor, and merge them over the folders on the live server
  7. Shut down the donor VM and detach the live VMs boot disk. Re-attach it to the live VM. Boot, and perform a ‘Bootrec /rebuildBCD’ to make the disk bootable again
  8. Once the live VM is booted, import the registry hives, making sure the Windows Module Installer and Windows Update services are started (otherwise you won’t be able to import the donor component registry hives)
  9. Run the system file checker ‘SFC/scannow’ and let that fix any corruptions remaining
  10. Download and run the Windows Update Readiness Tool (CheckSur)
  11. CheckSur should complete with no errors
  12. Windows update should now be working again

 (*Naturally we strongly recommend you perform this operation on a fresh backup or clone of the server you are using for target transplant, and not the live one, for obvious reasons!)

This looks hopeful. But what you really want to see......

Is THIS!! Mission accomplished!

So yes, this technique requires you to have a clean backup – and many reading this might say, if I had that – I wouldn’t need to restore! But if you are working with a complicated setup, something involving a highly customised application tightly embedded into Microsoft Active Directory like MS SharePoint, then I hope you can appreciate that simply using the old VM for live, this just wasn’t possible – the SharePoint configuration had changed considerably, even though the local server configuration had not. 

Thanks Guys!

I would like to finish off by sending an enormous Thank You to Vision Solutions support, to the many people there who worked on this difficult (seemingly impossible) case, in particular Steve Cuthbertson. As I mentioned previously, Vision Solutions really owned this problem and followed it all the way though to its successful resolution. They listened to us, and worked with us and the end result is that we have one seriously relieved and happy client. 

Finally, my contact and consultant for Microsoft support, Roy Hadjinicolaou – Roy believed in my crazy idea when he could have easily ‘thrown the book at me’ and closed the case. Roy also performed the most tortuous aspect of this work, bringing the donor server up to date and exporting the registries meaning that I did not have to repeat the same task at our end. 

So a huge thank you to both Steve and Roy for their parts in what Roy says, is officially, the worlds very first successful, fully documented, Windows Update Transplant operation!

You read it here first.

(mic. drop)



Comments

No Comments

Add Comment